Create a mask from labels to compute loss with numpy - python

I'm having troubles creating a mask without using a for loop.
I've got a numpy array of size N with my labels and I want to create a mask of size NxN where mask[i, j] = True if and only if y[i] == y[j].
I've managed to do so by using a for loop :
mask = np.asarray([np.where(y==y[k], 1, 0) for k in range(len(y))])
But I'm working on a GPU and this greatly increase the compute time. How can I do it without looping?

This might get you started:
n = 3
a = np.arange(n)
np.equal.outer(a, a)
# this is the same as
a[:,None] == a
Output:
array([[ True, False, False],
[False, True, False],
[False, False, True]])
This is basically comparing the elements from a cartesian product. a[0] == a[1], a[1] == a[1], a[1] == a[2] and so forth, which is why the diagonal values are True when using np.arange.

You can use np.repeat and .T
a and b are just arbitrary data - the labels in your case.
import numpy as np
size = 4
a = np.arange(size)[:, None]
b = a.T
b[0, 2] = 1
c = np.repeat(a.T, repeats=size, axis=0)
d = np.repeat(b, repeats=size, axis=0).T
print(c)
print(d)
e = np.equal(c, d)
print(e)
out:
[[0 1 1 3]
[0 1 1 3]
[0 1 1 3]
[0 1 1 3]]
[[0 0 0 0]
[1 1 1 1]
[1 1 1 1]
[3 3 3 3]]
[[ True False False False]
[False True True False]
[False True True False]
[False False False True]]

For problems like these, np.indices is your friend:
dims = (len(y), len(y))
inds = np.indices(dims)
mask = np.empty(dims, dtype=bool)
mask[inds] = y[inds[0]] == y[inds[1]]
edit:
Kevin's more specific solution is more concise and almost certainly faster than this method.

Related

How do I pass a list as changing condition in an array?

Let's say that I have an numpy array a = [1 2 3 4 5 6 7 8] and I want to change everything else but 1,2 and 3 to 0. With a list b = [1,2,3] a tried a[a not in b] = 0, but Python does not accept this. Currently I'm using a for loop like this:
c = a.unique()
for i in c:
if i not in b:
a[a == i] = 0
Which works very slowly (Around 900 different values in a 3D array around the size of 1000x1000x1000) and doesn't fell like the optimal solution for numpy. Is there a more optimal way doing it in numpy?
You can use numpy.isin() to create a boolean mask to use as an index:
np.isin(a, b)
# array([ True, True, True, False, False, False, False, False])
Use ~ to do the opposite:
~np.isin(a, b)
# array([False, False, False, True, True, True, True, True])
Using this to index the original array lets you assign zero to the specific elements:
a = np.array([1,2,3,4,5,6,7,8])
b = np.array([1, 2, 3])
a[~np.isin(a, b)] = 0
print(a)
# [1 2 3 0 0 0 0 0]

loop through matrix using map() function in python

how to use map() instead of the following nested for loop
the idea to not use for loop :)
def f(matrix):
r, c = np.shape(array)
for col in range(0,c):
for row in range(0,r):
if array[row][col] >= max(array[row]):
print("true")
else:
print("false")
I tried to use something similar this formate but I am stuck:
print(list(map(lambda (x,y): print(x[y]) , A)))
but not working
thank you :)
You can easily compare every element of a row to the row's max with np.where. Numpy has a built-in function which applies a 1D function along a given axis called np.apply_along_axis, which is equivalent but faster then looping over your array. Here is your solution on an example matrix with random elements:
import numpy as np
def max_comp(row):
return np.where(row>=max(row), True, False)
matrix = np.random.randint(10, size=(5,5))
output = np.apply_along_axis(max_comp, axis=1, arr=matrix)
print(matrix)
print(output
Out:
[[2 3 1 4 5]
[9 6 0 1 1]
[9 6 3 4 1]
[3 7 6 1 7]
[2 1 5 7 2]]
[[False False False False True]
[ True False False False False]
[ True False False False False]
[False True False False True]
[False False False True False]]
Without using numpy
If using a for in a list comphrehension still matches your question:
matrix = [[1, 2], [3, 4]]
map(
lambda row: [print(c >= max(row), end=" ") for c in row] and print(),
matrix
)
We can make this statement even a bit more cryptic by also removing the list comprehension:
matrix = [[1, 2], [3, 4]]
a = list(map(
lambda row:
list(map(lambda c: print(c >= max(row), end=" "), row))
and print(),
matrix
))
While this code does not make use of a for loop, I don't think it improves readibilty and/or performance.

numpy array slicing to avoid for loop

I am using numpy to do some calculations. In the following code:
assert(len(A.shape) == 2) # A is a 2D nparray
d1, d2 = A.shape
# want to initial G,which has the same dimension as A. And assign the last column of A to the last column of G
# initial with value 0
G = zero_likes(A)
# assign the last column to that of G
G[:, d2-1] = A[:, d2-1]
# the columns[0,dw-1] of G is the average of columns [0, dw-1] of A, based on the condition of B
for iW in range(d2-1):
n = 0
sum = 0.0
for i in range(d1):
if B[i, 0] != iW and B[i, 1] == 0:
sum += A[i, iW]
n += 1
for i in range(d1):
if B[i, 0] != iW and B[i, 1] == 0:
G[i, iW] = sum / (1.0 * n)
return G
Is there an easier way using "slicing" or "boolean array"?
Thanks!
In case you want G to have the same dimensionality as A and then change the appropriate elements of G, the following code should work:
# create G as a copy of A, otherwise you might change A by changing G
G = A.copy()
# getting the mask for all columns except the last one
m = (B[:,0][:,None] != np.arange(d2-1)[None,:]) & (B[:,1]==0)[:,None]
# getting a matrix with those elements of A which fulfills the conditions
C = np.where(m,A[:,:d2-1],0).astype(np.float)
# get the 'modified' average you use
avg = np.sum(C,axis=0)/np.sum(m.astype(np.int),axis=0)
# change the appropriate elements in all the columns except the last one
G[:,:-1] = np.where(m,avg,A[:,:d2-1])
After fiddling a long time and finding bugs ... I ended up with this code. I checked it against several random matrices A and specific choices of B
A = numpy.random.randint(100,size=(5,10))
B = np.column_stack(([4,2,1,3,4],np.zeros(5)))
and so far your and my result were in agreement.
Here's a start, focusing on the first inner loop:
In [35]: A=np.arange(12).reshape(3,4)
In [36]: B=np.array([[0,0],[1,0],[2,0]])
In [37]: sum=0
In [38]: for i in range(3):
if B[i,0]!=iW and B[i,1]==0:
sum += A[i,iW]
print(i,A[i,iW])
....:
1 4
2 8
In [39]: A[(B[:,0]!=iW)&(B[:,1]==0),iW].sum()
Out[39]: 12
I had to provide my own sample data to test this.
The 2nd loop has the same condition (B[:,0]!=iW)&(B[:,1]==0), and should work in the same way.
As one of the comments said, the dimensions of G look funny. To make things work with my sample, lets make zeros array. It looks like you are assigning to selected elements of G, the mean of a subset of A (sum/n)
In [52]: G=np.zeros_like(A)
In [53]: G[I,iW]=A[I,iW].mean()
Assuming n, the number of terms summed for each iW varies, it may be difficult to compress the outer loop into a vectorized step. If n was the same, you could pull out subset of A that matches the condition, e.g, A1, take the mean on one axis, an assign the values to G. With different numbers of terms in the sums, you still have to loop.
It just occurred to me that masked arrays might work. Mask off the terms of A that don't meet the condition, and then take the mean.
In [91]: I=(B[:,[0]]!=np.arange(4))&(B[:,[1]]==0)
In [92]: I
Out[92]:
array([[False, True, True, True],
[ True, False, True, True],
[ True, True, False, True]], dtype=bool)
In [93]: A1=np.ma.masked_array(A, ~I)
In [94]: A1
Out[94]:
masked_array(data =
[[-- 1 2 3]
[4 -- 6 7]
[8 9 -- 11]],
mask =
[[ True False False False]
[False True False False]
[False False True False]],
fill_value = 999999)
In [95]: A1.mean(0)
Out[95]:
masked_array(data = [6.0 5.0 4.0 7.0],
mask = [False False False False],
fill_value = 1e+20)
Or with plonser's where:
In [111]: np.where(I,A,0).sum(0)/I.sum(0)
Out[111]: array([ 6., 5., 4., 7.])

Optimised call to retrieve indices of the masked elements of a masked array?

I have a masked array:
a = np.arange(7)
a = np.ma.masked_greater(a,4)
a then contains
masked_array(data = [0 1 2 3 4 -- --],
mask = [False False False False False True True],
fill_value = 999999)
What I'm looking for now is an efficient way to retrieve an array that lists the index of each masked element, i.e.
res = [5, 6]
without looping through the mask like so:
res = []
for idx, data in enumerate(np.ma.getmaskarray(a)):
if data:
res.append(idx)
>>> a
masked_array(data = [0 1 2 3 4 -- --],
mask = [False False False False False True True],
fill_value = 999999)
>>> np.where(np.ma.getmaskarray(a))
(array([5, 6]),)

getting indices when comparing multidimensional arrays

I have two numpy arrays, one an RGB image, one a lookup table of pixel values, for example:
img = np.random.randint(0, 9 , (3, 3, 3))
lut = np.random.randint(0, 9, (1,3,3))
What I'd like is to know the x,y coordinate in lut of pixels whose values are common to img and lut, so I tried:
for x in xrange(img.shape[0]):
for y in xrange(img.shape[1]):
print np.transpose(np.concatenate(np.where(lut == img[x,y])))
At this point, the problem is that img[x,y], which will be in the form of [int_r, int_g, int_b] does not get evaluated as a single element, so the three components get sought for separately in img...
I would like the output to be something like:
(x_coord, y_coord)
But I only get output in the form of:
[0 0 0]
[0 2 1]
[0 0 2]
[0 0 0]
[0 0 0]
[0 0 2]
[0 0 1]
[0 2 2]
[0 1 2]
Can anyone please help? Thanks!
img = np.random.randint(0, 9 , (3, 3, 3))
lut2 = img[1,2,:] # so that we know exactly the answer
# compare two matrices
img == lut2
array([[[False, False, False],
[False, False, False],
[False, True, False]],
[[False, False, False],
[False, False, False],
[ True, True, True]],
[[ True, False, False],
[ True, False, False],
[False, False, False]]], dtype=bool)
# rows with all true are the matching ones
np.where( (img == lut2).sum(axis=2) == 3 )
(array([1]), array([2]))
I don't really know why lut is filled with random numbers. But, I assume that you want to look for the pixels that have the exactly same color. If so, this seems to work. Is this what you need to do?
#otterb 's answer works if lut is defined as a single [r,g,b] pixel slice, but it needs to be tweaked a little if you want to generalize this process to a multi-pixel lut:
img = np.random.randint(0, 9 , (3, 3, 3))
lut2 = img[0:1,0:2,:]
for x in xrange(lut2.shape[0]):
for y in xrange(lut2.shape[1]):
print lut2[x,y]
print np.concatenate(np.where( (img == lut2[x,y]).sum(axis=2) == 3 ))
yields:
[1 1 7]
[0 0]
[8 7 4]
[0 1]
where triplets are pixel values, and doublets are their coordinates in the lut.
Cheers, and thanks to #otterb!
PS: iteration over numpy arrays is bad. The above is not production code.

Categories