I have a 2D numpy array which looks like
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]]) `
I want to create bounding box like masks over the 1s shown above. For example it should look like this
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.]])
How can I do it it easily? Also how do I do it if other no.s like 2,3 etc exist but I want to ignore them and the groups are mostly 2.
We have skimage.measure to make life easy when it comes to component labeling. We can use skimage.measure.label to label the different components in the array, and skimage.measure.regionprops to obtain the corresponding slices, which we can use to set the values to 1 in this case:
def fill_bounding_boxes(x):
l = label(x)
for s in regionprops(l):
x[s.slice] = 1
return x
If we try with the proposed example:
from skimage.measure import label, regionprops
a = np.array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.]])
We get:
fill_bounding_boxes(x)
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.]])
While the previous responses are perfectly fine, here's how you could do it with scipy.ndimage:
import numpy as np
from scipy import ndimage
def fill_bboxes(x):
x_components, _ = ndimage.measurements.label(x, np.ones((3, 3)))
bboxes = ndimage.measurements.find_objects(x_components)
for bbox in bboxes:
x[bbox] = 1
return x
ndimage.measurements.label does a connected component labelling with the 3x3-"ones" matrix defining the neighbourhood. find_objects then determines the bounding box for each component, which you can then use to set everything within to 1.
There is one solution, but its a little bit hacky and I will not program it for you.
OpenCV - Image processing library, has a algorithm for finding Rectangular contour -> Straight or Rotated. What you may want to do is to transform your array into 2D grayscale image, find contours and write inside the contours your 1s.
Check this image - it is from Opencv DOC - 7.a - https://docs.opencv.org/3.4/dd/d49/tutorial_py_contour_features.html
You would be interested in everything that is inside green lines.
To be honest, I think seems to me much easier than programming some algorithm for bounding boxes
Note
Of course you dont really need to do the image stuff, but I think it is enough to use opencv's algorithm for the bounding boxes(countours)
This is an interesting problem. A 2D convolution is a natural approach. However, if the input matrix is sparse (as it appears in your example), this can be costly. For sparse matrix, another approach is to use a clustering algorithm. This extracts only the non-zero pixels from the input box a (the array in your example), and runs a hierarchical clustering. The clustering is based on a special distance matrix (a tuple). Merging happens if boxes are separated by a max of 1 pixel in either direction. You can also apply filter for any numbers you need in the initialization step (say only do for a[row, col]==1 and skip any other numbers, or whatever you wish.
from collections import namedtuple
Point = namedtuple("Point",["x","y"]) # a pixel on the matrix
Box = namedtuple("Box",["tl","br"]) # a box defined by top-lef/bottom-right
def initialize(a):
""" create a separate bounding box at each non-zero pixel. """
boxes = []
rows, cols = a.shape
for row in range(rows):
for col in range(cols):
if a[row, col] != 0:
boxes.append(Box(Point(row, col),Point(row, col)))
return boxes
def dist(box1, box2):
""" dist between boxes is from top-left to bottom-right, or reverse. """
x = min(abs(box1.br.x - box2.tl.x), abs(box1.tl.x - box2.br.x))
y = min(abs(box1.br.y - box2.tl.y), abs(box1.tl.y - box2.br.y))
return x, y
def merge(boxes, i, j):
""" pop the boxes at the indices, merge and put back at the end. """
if i == j:
return
if i >= len(boxes) or j >= len(boxes):
return
ii = min(i, j)
jj = max(i, j)
box_i = boxes[ii]
box_j = boxes[jj]
x, y = dist(box_i, box_j)
if x < 2 or y < 2:
tl = Point(min(box_i.tl.x, box_j.tl.x),min(box_i.tl.y, box_j.tl.y))
br = Point(max(box_i.br.x, box_j.br.x),max(box_i.br.y, box_j.br.y))
del boxes[ii]
del boxes[jj-1]
boxes.append(Box(tl, br))
def cluster(a, max_iter=100):
"""
initialize the cluster. then loop through the length and merge
boxes. break if `max_iter` reached or no change in length.
"""
boxes = initialize(a)
n = len(boxes)
k = 0
while k < max_iter:
for i in range(n):
for j in range(n):
merge(boxes, i, j)
if n == len(boxes):
break
n = len(boxes)
k = k+1
return boxes
cluster(a)
# output: [Box(tl=Point(x=2, y=2), br=Point(x=5, y=4)),Box(tl=Point(x=11, y=9), br=Point(x=14, y=11))]
# performance 275 µs ± 887 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# compares to 637 µs ± 9.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) for
#the method based on 2D convolution
This returns a list of boxes defined by the corner points (top-left and bottom-right). Here x is the row number and y is the column numbers. The initialization loops through the entire matrix. But after that we only process a very small subset of points. By changing the dist function, you can customize the box definition (overlapping, non-overlapping etc). Performance can further be optimized (for e.g. breaking if i or j greater the length of boxes within the for loops, than simply returning from the merge function and continue).
Related
Could anyone teach me why the below code uses dim=1 in the scatter_ method? The meaning of the attached codes is for one-hot encoding. I tried to read the PyTorch document example and thought I should use dim=0 for the desired result. However, the result has shown that dim=1 is correct instead.
>>> target = torch.tensor([3, 5, 0, 2, 7, 5])
>>> target
tensor([3, 5, 0, 2, 7, 5])
>>> onehot = torch.zeros(target.shape[0], 8)
>>> onehot.scatter_(1, target.unsqueeze(1), 1.0)
tensor([[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0., 1., 0., 0.]])
You are applying scatter on a zero tensor onehot shaped (len(target), 8) on dim=1 using target as input and 1. as value. This will have the following effect on onehot:
onehot[i][target[i][j]] = 1.
This means for every row in target it will look at the unique value since j is always equal to 1 and use it to index the 2nd axis of onehot. In other words, for every row, it takes the value from target to position the 1. among the columns of onehot.
Step by step illustration would be:
>>> for i in range(len(target)):
... k = target[i] # k, depends on values of target i.e. dim=1
... onehot[i, k] = 1
... print(onehot)
tensor([[0., 0., 0., 1., 0., 0., 0., 0.], # i=0; k=3
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.], # i=1; k=5
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0.], # i=2; k=0
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.], # i=3; k=2
[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1.], # i=4; k=7
[0., 0., 0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0., 1., 0., 0.]]) # i=5; k=5
Notice that onehot.scatter_(0, target.unsqueeze(1), 1.0) would have produced:
onehot[target[i][j]][j] = 1.
Which is a valid operation only if you initialize onehot the other way around:
>>> onehot = torch.zeros(8, len(target))
>>> onehot.scatter_(0, target.unsqueeze(1), 1.)
tensor([[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0.]])
And you get the transpose of the other matrix.
I am new in computer vision and I am currently working on a numpy array of 0 and 1 on python as follow :
I am trying to find the contour of the shape formed by the cells that are equal to 1, this is what the result should look like :
I would like to be able to get the position of each element highlighted in green following a certain order (counter clockwise for exemple).
I tried to use the findContours function of OpenCV in python by following some examples I found on the web but I didn't make it work :
# Import
import numpy as np
import cv2
# Find contours
tableau_poche = np.array([[1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.],
[1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.],
[1., 1., 1., 1., 1., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
0.],
[0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.]])
tableau_poche = np.int8(tableau_poche)
contours, hierarchy = cv2.findContours(tableau_poche, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
I get the following message :
error: OpenCV(4.0.1) C:\ci\opencv-suite_1573470242804\work\modules\imgproc\src\thresh.cpp:1492: error: (-210:Unsupported format or combination of formats) in function 'cv::threshold'
Actually, I don't know if I am supposed to use this OpenCV function (maybe the "matplotlib.pyplot.contour()" function can solve my problem too ...) or if it's possible to use it on the numpy array I have. In a near future, I might be interested by the convexityDefects function of OpenCV on my numpy array.
You have a type issue. This OpenCV function works only with unsigned integer uint8. Your array uses signed integer int8.
simply replace:
tableau_poche = np.int8(tableau_poche)
by
tableau_poche = tableau_poche.astype(np.uint8)
and the findContour() will work.
As pointed out in comments, you can get what you want (or very close) by changing the attributes of the findContour(). By using cv2.CHAIN_APPROX_NONE instead of cv2.CHAIN_APPROX_SIMPLE, it will give you all the points of the contour. However, it works vertically, horizontally and diagonally for any pixel of distance 1. So it is not 100% what you had on your question as it "cuts" the corners.
See documentation here for more info on options ContourApproximationModes
My data are currently in Pandas data frame and the shape of each data frame like:
Matrix1, matrix2, and matrix3 each having shapes like 4 rows and 19 columns. Colum names are different for each matrix.
How do I convert these into a tensor of a shape 3,4,19? Can anybody help to solve this?
Let's say you have the following dataframes:
data = [pd.DataFrame(np.zeros((4,19))) for x in range(3)] #shape 4, 19
list_of_matrices = [np.array(df) for df in data]
That is:
[array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.]]),
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.]]),
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.]])]
Then turning them into one single tensor can be done this way:
import torch
list_of_matrices = [np.array(df) for df in data]
torch.tensor(np.stack(list_of_matrices))
list_of_tensors = [torch.tensor(np.array(df)) for df in data]
torch.stack(list_of_tensors)
which gives:
tensor([[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]],
dtype=torch.float64)
I am building up dataset for Seq2Seq model which requires the data to be in the form of one-hot encoded padded sequences.
For Example if my sequence contains 'a' (a), then it should generate something like following (given max sequence size can be 4):
[[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]
So I tired to first pad the sequence and then one-hot encode the padded sequences (somewhat answered in this answer).
train_padded_txt_Y1 = to_categorical(pad_sequences(training_txt_Y1, maxlen=max_label_len, padding='post', value = len(char_list)))
However, the above produces one-hot-encoded padded sequences like following, that are where the padding character is being treated as a class to be encoded:
[[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]
You can see an additional element in the generated one-hot encoding of each.
So the question here is that can something be done using Keras utilities to get the one-hot encoded padded sequence that I need or do I have to go for some custom implementation?
Given an adjacency list Y:
Y = np.array([[0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0.]])
and list of indexes of random numbers:
idx = sorted(random.sample(range(0, len(Y)), 5))
[0, 3, 7, 10, 14]
I would like 0th, 3rd, 7th, 10th and 14th row/column of the adjacency matrix extracted such that my new Yhat becomes the point where the 5 rows/columns overlaps such as:
meaning my Yhat becomes
Yhat = np.array([[0,0,0,0,0],
[0,0,0,1,0],
[0,0,0,0,0],
[0,1,0,0,0],
[0,0,0,0,0]])
Right now I am doing it with loops and checks, but I feel like it should be possible to do with numpy list slicing, any hints would be appreciated!
This seems to do the trick, first slice the idx rows, then slice the idx columns: Y[idx][:,idx]