getting indices when comparing multidimensional arrays - python

I have two numpy arrays, one an RGB image, one a lookup table of pixel values, for example:
img = np.random.randint(0, 9 , (3, 3, 3))
lut = np.random.randint(0, 9, (1,3,3))
What I'd like is to know the x,y coordinate in lut of pixels whose values are common to img and lut, so I tried:
for x in xrange(img.shape[0]):
for y in xrange(img.shape[1]):
print np.transpose(np.concatenate(np.where(lut == img[x,y])))
At this point, the problem is that img[x,y], which will be in the form of [int_r, int_g, int_b] does not get evaluated as a single element, so the three components get sought for separately in img...
I would like the output to be something like:
(x_coord, y_coord)
But I only get output in the form of:
[0 0 0]
[0 2 1]
[0 0 2]
[0 0 0]
[0 0 0]
[0 0 2]
[0 0 1]
[0 2 2]
[0 1 2]
Can anyone please help? Thanks!

img = np.random.randint(0, 9 , (3, 3, 3))
lut2 = img[1,2,:] # so that we know exactly the answer
# compare two matrices
img == lut2
array([[[False, False, False],
[False, False, False],
[False, True, False]],
[[False, False, False],
[False, False, False],
[ True, True, True]],
[[ True, False, False],
[ True, False, False],
[False, False, False]]], dtype=bool)
# rows with all true are the matching ones
np.where( (img == lut2).sum(axis=2) == 3 )
(array([1]), array([2]))
I don't really know why lut is filled with random numbers. But, I assume that you want to look for the pixels that have the exactly same color. If so, this seems to work. Is this what you need to do?

#otterb 's answer works if lut is defined as a single [r,g,b] pixel slice, but it needs to be tweaked a little if you want to generalize this process to a multi-pixel lut:
img = np.random.randint(0, 9 , (3, 3, 3))
lut2 = img[0:1,0:2,:]
for x in xrange(lut2.shape[0]):
for y in xrange(lut2.shape[1]):
print lut2[x,y]
print np.concatenate(np.where( (img == lut2[x,y]).sum(axis=2) == 3 ))
yields:
[1 1 7]
[0 0]
[8 7 4]
[0 1]
where triplets are pixel values, and doublets are their coordinates in the lut.
Cheers, and thanks to #otterb!
PS: iteration over numpy arrays is bad. The above is not production code.

Related

How do I pass a list as changing condition in an array?

Let's say that I have an numpy array a = [1 2 3 4 5 6 7 8] and I want to change everything else but 1,2 and 3 to 0. With a list b = [1,2,3] a tried a[a not in b] = 0, but Python does not accept this. Currently I'm using a for loop like this:
c = a.unique()
for i in c:
if i not in b:
a[a == i] = 0
Which works very slowly (Around 900 different values in a 3D array around the size of 1000x1000x1000) and doesn't fell like the optimal solution for numpy. Is there a more optimal way doing it in numpy?
You can use numpy.isin() to create a boolean mask to use as an index:
np.isin(a, b)
# array([ True, True, True, False, False, False, False, False])
Use ~ to do the opposite:
~np.isin(a, b)
# array([False, False, False, True, True, True, True, True])
Using this to index the original array lets you assign zero to the specific elements:
a = np.array([1,2,3,4,5,6,7,8])
b = np.array([1, 2, 3])
a[~np.isin(a, b)] = 0
print(a)
# [1 2 3 0 0 0 0 0]

Apply numpy 'where' along one of axes

I have an array like that:
array = np.array([
[True, False],
[True, False],
[True, False],
[True, True],
])
I would like to find the last occurance of True for each row of the array.
If it was 1d array I would do it in this way:
np.where(array)[0][-1]
How do I do something similar in 2D? Kind of like:
np.where(array, axis = 1)[0][:,-1]
but there is no axis argument in np.where.
Since True is greater than False, find the position of the largest element in each row. Unfortunately, argmax finds the first largest element, not the last one. So, reverse the array sideways, find the first True from the end, and recalculate the indexes:
(array.shape[1] - 1) - array[:, ::-1].argmax(axis=1)
# array([0, 0, 0, 1])
The method fails if there are no True values in a row. You can check if that's the case by dividing by array.max(axis=1). A row with no Trues will have its last True at the infinity :)
array[0, 0] = False
((array.shape[1] - 1) - array[:, ::-1].argmax(axis=1)) / array.max(axis=1)
#array([inf, 0., 0., 1.])
I found an older answer but didn't like that it returns 0 for both a True in the first position, and for a row of False.
So here's a way to solve that problem, if it's important to you:
import numpy as np
arr = np.array([[False, False, False], # -1
[False, False, True], # 2
[True, False, False], # 0
[True, False, True], # 2
[True, True, False], # 1
[True, True, True], # 2
])
# Make an adustment for no Trues at all.
adj = np.sum(arr, axis=1) == 0
# Get the position and adjust.
x = np.argmax(np.cumsum(arr, axis=1), axis=1) - adj
# Compare to expected result:
assert np.all(x == np.array([-1, 2, 0, 2, 1, 2]))
print(x)
Gives [-1 2 0 2 1 2].

Create a mask from labels to compute loss with numpy

I'm having troubles creating a mask without using a for loop.
I've got a numpy array of size N with my labels and I want to create a mask of size NxN where mask[i, j] = True if and only if y[i] == y[j].
I've managed to do so by using a for loop :
mask = np.asarray([np.where(y==y[k], 1, 0) for k in range(len(y))])
But I'm working on a GPU and this greatly increase the compute time. How can I do it without looping?
This might get you started:
n = 3
a = np.arange(n)
np.equal.outer(a, a)
# this is the same as
a[:,None] == a
Output:
array([[ True, False, False],
[False, True, False],
[False, False, True]])
This is basically comparing the elements from a cartesian product. a[0] == a[1], a[1] == a[1], a[1] == a[2] and so forth, which is why the diagonal values are True when using np.arange.
You can use np.repeat and .T
a and b are just arbitrary data - the labels in your case.
import numpy as np
size = 4
a = np.arange(size)[:, None]
b = a.T
b[0, 2] = 1
c = np.repeat(a.T, repeats=size, axis=0)
d = np.repeat(b, repeats=size, axis=0).T
print(c)
print(d)
e = np.equal(c, d)
print(e)
out:
[[0 1 1 3]
[0 1 1 3]
[0 1 1 3]
[0 1 1 3]]
[[0 0 0 0]
[1 1 1 1]
[1 1 1 1]
[3 3 3 3]]
[[ True False False False]
[False True True False]
[False True True False]
[False False False True]]
For problems like these, np.indices is your friend:
dims = (len(y), len(y))
inds = np.indices(dims)
mask = np.empty(dims, dtype=bool)
mask[inds] = y[inds[0]] == y[inds[1]]
edit:
Kevin's more specific solution is more concise and almost certainly faster than this method.

Elegant way to check co-ordinates of a 2D NumPy array lie within a certain range

So let us say we have a 2D NumPy array (denoting co-ordinates) and I want to check whether all the co-ordinates lie within a certain range. What is the most Pythonic way to do this? For example:
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
#ALL THE COORDINATES WITHIN x-> 0 to 4 AND y-> 0 to 4 SHOULD
BE PUT IN b (x and y ranges might not be equal)
b = #DO SOME OPERATION
>>> b
>>> [[3,4],
[0,0]]
If the range is the same for both directions, x, and y, just compare them and use all:
import numpy as np
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
a[(a >= 0).all(axis=1) & (a <= 4).all(axis=1)]
# array([[3, 4],
# [0, 0]])
If the ranges are not the same, you can also compare to an iterable of the same size as that axis (so two here):
mins = 0, 1 # x_min, y_min
maxs = 4, 10 # x_max, y_max
a[(a >= mins).all(axis=1) & (a <= maxs).all(axis=1)]
# array([[1, 5],
# [3, 4]])
To see what is happening here, let's have a look at the intermediate steps:
The comparison gives a per-element result of the comparison, with the same shape as the original array:
a >= mins
# array([[False, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, False],
# [False, False]], dtype=bool)
Using nmpy.ndarray.all, you get if all values are truthy or not, similarly to the built-in function all:
(a >= mins).all()
# False
With the axis argument, you can restrict this to only compare values along one (or multiple) axis of the array:
(a >= mins).all(axis=1)
# array([False, True, True, True, True, False, False], dtype=bool)
(a >= mins).all(axis=0)
# array([False, False], dtype=bool)
Note that the output of this is the same shape as array, except that all dimnsions mentioned with axis have been contracted to a single True/False.
When indexing an array with a sequence of True, False values, it is cast to the right shape if possible. Since we index an array with shape (7, 2) with an (7,) = (7, 1) index, the values are implicitly repeated along the second dimension, so these values are used to select rows of the original array.

Get first index in array where relation is true

How can I get the last index of the element in a where b > a when a and b have different length using numpy.
For instance, for the following values:
>>> a = np.asarray([10, 20, 30, 40])
>>> b = np.asarray([12, 25])
I would expect a result of [0, 1] (0.. because 12 > 10 -> index 0 in a; 1.. because 25 > 20 -> index 1 in a). Obviously, the length of the result vector should equal the length of b (and the values of the result list should be less than the length of a (as they refer to the indices in a)).
Another test is for b = np.asarray([12, 25, 31, 9, 99]) (same a as above), the result should be array([ 0, 1, 2, -1, 3]).
A vectorized solution:
Remember that you can compare all elements in b with all elements in a using broadcasting:
b[:, None] > a
# array([[ True, False, False, False], # b[0] > a[:]
# [ True, True, False, False]]) # b[1] > a[:]
And now find the index of the last True value in each row, which equals to the first False value in each row, minus 1
np.argmin((b[:, None] > a), axis=1) - 1
# array([0, 1])
Note that there might be an ambiguity as to what a returned value of -1 means. It could mean
b[x] was larger than all elements in a, or
b[x] was not larger than any element in a
In our data, this means
a = np.asarray([10, 20, 30, 40])
b = np.asarray([9, 12, 25, 39, 40, 41, 50])
mask = b[:, None] > a
# array([[False, False, False, False], # 9 is smaller than a[:], case 2
# [ True, False, False, False],
# [ True, False, False, False],
# [ True, True, True, False],
# [ True, True, True, False],
# [ True, True, True, True], # 41 is larger than a[:], case 1
# [ True, True, True, True]]) # 50 is larger than a[:], case 1
So for case 1 we need to find rows with all True values:
is_max = np.all(mask, axis=1)
And for case 2 we need to find rows with all False values:
none_found = np.all(~mask, axis=1)
This means we can use the is_max to find and replace all case 1 -1 values with a positive index
mask = b[:, None] > a
is_max = np.all(mask, axis=1)
# array([False, False, False, False, False, True, True])
idx = np.argmin(mask, axis=1) - 1
# array([-1, 0, 0, 2, 2, -1, -1])
idx[is_max] = len(a) - 1
# array([-1, 0, 0, 2, 2, 3, 3])
However be aware that the index -1 has a meaning: Just like 3 it already means "the last element". So if you want to use idx for indexing, keeping -1 as an invalid value marker may cause trouble down the line.
Works even a has shorter length than b , first choose shorter list length then check if its has smaller numbers element wise :
[i for i in range(min(len(a),len(b))) if min(a, b, key=len)[i] > max(a, b, key=len)[i]]
# [0, 1]
You can zip a and b to combine them and then enumerate to iterate it with its index
[i for i,(x,y) in enumerate(zip(a,b)) if y>x]
# [0, 1]
np.asarray([i for i in range(len(b)) if b[i]>a[i]])
This should give you the answer. Also the length does not have to be same as that of either a or b.

Categories