Search for a one-hot encoded label in ndarray

Search for a one-hot encoded label in ndarray - python

I have an ndarray called labels with a shape of (6000, 8). This is 6000 one-hot encoded arrays with 8 categories. I want to search for labels that looks like this:
[1,0,0,0,0,0,0,0]
and then tried to do like this
np.where(labels==[1,0,0,0,0,0,0,0,0])
but this does not produce the expected result

You need all along the second axis:
np.where((labels == [1,0,0,0,0,0,0,0]).all(1))
See with this smaller example:
labels = np.array([[1,0,0,1,0,0,0,0],
[0,0,0,0,0,1,1,0],
[1,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,1]])
(labels == [1,0,0,0,0,0,0,0])
array([[ True, True, True, False, True, True, True, True],
[False, True, True, True, True, False, False, True],
[ True, True, True, True, True, True, True, True],
[False, True, True, True, True, True, True, False]])
Note that the above comparisson simply returns an array of the same shape as labels, since the comparisson has taken place along the rows of labels. You need to aggregate with all, to check whether all elements in a row are True:
(labels == [1,0,0,0,0,0,0,0]).all(1)
#array([False, False, True, False])

Related

Finding the index of peak point ( the first True) in boolean tensor mask

After running a detectron2 model in pytorch, Detectron2 gives me the object masks that it finds as a (true/false) tensor. there are 33 objects found in the image so I have torch.Size([33, 683, 1024]).
tensor([[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]], device='cuda:0')
This is great so far. But I need the peak coordinates in y dimension (height) of those 33 objects. (Lets say the object is baloon, then I need the top of the baloon as (x,y) point)
Any idea how can I get the peak point coordinates as fast as possible
thanks in advance

I had iterated through each dimension and checked for the True condition met, but it took minutes to find out the indexes
Then I have used the torch.where method and it finds all the indexes that meet the condition instantly.
for maskCounter in range(masks.shape[0]):
print((torch.where(masks[maskCounter] == True)[0][0]).item(), (torch.where(masks[maskCounter] == True)[1][0]).item())

Subtracting image elipses from each other using numpy masks

I am trying to calculate the area within a specific region by utilizing masks in shape of elipses, and taking the mean of the values inside the mask. Like this:
This is an eye image that is originally:
What i want to do is calculate the area of the sclera and iris separately. The way i want to do it is by generating masks, one just for the iris a second for the entire eye and a third to subtract the iris mask from the entire eye mask to obtain the sclera mask, where the first mask is just the iris region, the second area is the entire eye and the third will be a subtraction, which is exactly i am tring to do. Through subtraction achieve the area of the sclera separately.
The problem is that my mask function returns me boolean values, this what i was trying to do:
from PIL import Image
img = Image.open(r'imgpath')
h_1 = 16
k_1 = 31
a_1 = 7
b_1 = 17
#elipse function
def _in_ellipse(x, y, h, k, a, b):
z = ((x-h)**2)/a**2 + ((y-k)**2)/b**2
if z < 1:
return True
else:
return False
in_ellipse = np.vectorize(_in_ellipse)
img = np.asarray(img)
mask = in_ellipse(*np.indices(img.shape), h_1,k_1,a_1,b_1)
#Visualize the mask size
plt.imshow(mask)
plt.show()
#See if its inside the boundaries
plt.imshow(np.where(mask, img, np.nan))
plt.show()
mask_mean = np.nanmean((np.where(mask, img, np.nan)))
What i am trying to do is before calculating the mean values, i want to grab the mean value of the sclera alone, an attempt was through subtraction of the two areas, but elipse function does not return pixel values as expected, it returns boolean values:
mask:
array([[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]])

From what I understand, to generate the sclera_mask from eye_mask and iris_mask, both of which are of boolean type, your idea of subtraction translates to a logical, element-wise XOR operation on the two masks:
sclera_mask = np.logical_xor(eye_mask, iris_mask)
More on this in the docs: numpy.logical_xor

Logical comparison of a symmetric matrix and its transpose

I was trying to test numerically if the multiplication of a matrix and its transpose really generates a symmetric square matrix.
Below is the code I used:
mat = np.array([[1,2,3],[1,0,1],[1,1,1],[2,3,5]])
mat2 = np.linalg.inv(np.matmul(np.transpose(mat),mat))
mat2
array([[ 1.42857143, 0.42857143, -0.85714286],
[ 0.42857143, 1.92857143, -1.35714286],
[-0.85714286, -1.35714286, 1.21428571]])
mat2 appears to be symmetric.
However, the result of the below code made me confused:
np.transpose(mat2) == mat2
array([[ True, False, False],
[False, True, False],
[False, False, True]])
But when I did the same procedure with mat, the result was as I expected:
np.transpose(np.matmul(np.transpose(mat),mat)) == np.matmul(np.transpose(mat),mat)
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])
Is this related a computational issue? If then, how can I show that the off-diagonal elements are identical?

Comparing mat and mat.T, you're comparing integers to integers and there's no problem.
mat2 is floating point, which is prone to subtle errors. When you print out mat2, you're seeing the truncated version of the full digits. Look at the difference between mat2 and mat2.T:
>>> mat2 - mat2.T
array([[ 0.00000000e+00, 1.11022302e-16, -1.11022302e-16],
[-1.11022302e-16, 0.00000000e+00, 2.22044605e-16],
[ 1.11022302e-16, -2.22044605e-16, 0.00000000e+00]])
The differences are on the order of 0.0000000000000001, meaning that they're equal "for all intents an purposes" but not equal exactly. There are two places to go from here. You can either accept that numerical precision is limited and use something like numpy.allclose for your equality tests, which allows for some small errors:
>>> np.allclose(mat2, mat2.T)
True
Or, if you really insist on your matrices being symmetric, you can enforce it with something like this:
>>> mat3 = (mat2 + mat2.T)/2
>>> mat3 == mat3.T
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])

Speeding up code: finding the amount of points within a certain radius

I have to find the amount of jellyfish that stranded within a certain radius of a beach. I have masked latitude and longitude arrays.
My arrays look like this (Xpos1 and Ypos1 are similar):
Xpos1
masked_array(
data=[[50.04410171508789, 50.06398010253906, 50.08057403564453, ...,
--, --, --],
[49.99235534667969, 50.02357482910156, 50.0404052734375, ..., --,
--, --],
[50.04730987548828, 50.074710845947266, 50.092201232910156, ...,
--, --, --],
...,
[49.98905944824219, 50.507293701171875, 50.48957061767578, ...,
51.069766998291016, 50.74513626098633, 51.06978988647461],
[49.91417694091797, 50.510562896728516, 50.48354721069336, ...,
51.069766998291016, 50.95227813720703, 51.06978988647461],
[49.976619720458984, 50.504817962646484, 50.487918853759766, ...,
51.069766998291016, 50.75497817993164, 51.06978988647461]],
mask=[[False, False, False, ..., True, True, True],
[False, False, False, ..., True, True, True],
[False, False, False, ..., True, True, True],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]],
fill_value=9.96921e+36,
dtype=float32)
len(Xpos1[0])= 800 000 #Particle
len(Xpos1)=124 # Timesteps. Xpos1[0] is day 1 hour 1, Xpos1[1] is day 1 hour 2 and so on.
And now I have to find which of these points are close to a certain beach. The problem with my current code, it takes me 5 minutes to compute it if I only take a Xpos[0:3] because the arrays are so big. Given I need to check this for several areas, my current code is going to take me a lifetime to finish running.
How do I speed up this code? The output I want is an array (Blen) that for every timestep gives me the amount of jellys within the radius.
Blen=[124,253,100,...]
len(Blen)=124
My code is:
#Location of the beach I need to check
lonAa=2.631547
latAa=51.120983
#Frame
#1 degree longitude ||
LongDegree=2*pi*r*(np.cos(math.radians(LatM)))/360
#1 degree latitude =
LatDegree=2*pi*r/360
FrameRadius=10#km
Timestep=3 #Number of timesteps you want to check, for now only 3 and it already takes some time
#Make the frame within which the jellys have to be counted
FrameLeft=lonAAa-(FrameRadius/LongDegree)
FrameRight=lonAAa+(FrameRadius/LongDegree)
FrameUp=latAAa+(FrameRadius/LatDegree)
FrameDown=latAAa-(FrameRadius/LatDegree)
BFrame=np.zeros((Timestep,len(Xpos1[0])))
#And now the slow part
Blen=[]
for i in range (0,Timestep):
for j in range (0,len(Xpos1[0])):
if FrameDown<=Ypos1[i][j]<=FrameUp and FrameLeft<=Xpos1[i][j]<=FrameRight:
BFrame[i][j]=Ypos1[i][j]
Blen=np.count_nonzero(BFrame,axis=1)

What you're looking to do is something called "profiling" the script. Python already has a good profiling tool called cprofile:
https://docs.python.org/2/library/profile.html
You can even invoke cprofile from the command line with:
python -m cprofile -o output_file python_script.py
From there, you should be able to infer where the bottlenecks in your code are.

sequence mask in tensorflow

I have been following tensorflow documentation for a while.
Recently I found a function sequence_mask() which is actually very useful.
According to the official documentation,
sequence_mask(
lengths,
maxlen=None,
dtype=tf.bool,
name=None
)
here they also provided two example,
tf.sequence_mask([1, 3, 2], 5) # [[True, False, False, False, False],
# [True, True, True, False, False],
# [True, True, False, False, False]]
tf.sequence_mask([[1, 3],[2,0]]) # [[[True, False, False],
# [True, True, True]],
# [[True, True, False],
# [False, False, False]]]
While I was testing them in my computer, First example executed successfully. But while I run the second example, there is an error message showed.
ValueError: lengths must be 1D for sequence_mask
So what is the problem?

I think its because of the Tesorflow version that you have installed. Update to Tensorflow version 1.4.0 and should work fine.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Search for a one-hot encoded label in ndarray - python

I have an ndarray called labels with a shape of (6000, 8). This is 6000 one-hot encoded arrays with 8 categories. I want to search for labels that looks like this: [1,0,0,0,0,0,0,0] and then tried to do like this np.where(labels==[1,0,0,0,0,0,0,0,0]) but this does not produce the expected result

Related

Finding the index of peak point ( the first True) in boolean tensor mask

Subtracting image elipses from each other using numpy masks

Logical comparison of a symmetric matrix and its transpose

Speeding up code: finding the amount of points within a certain radius

sequence mask in tensorflow

Categories

Resources