Merging regions in MSER for identifying text lines in OCR - python

I am using MSER to identify text regions in MSER. I am using the following code to extract the regions and save them as an image. Currently, each identified region is saved as a separate image. But, I want to merge regions belonging to a line of text merged as a single image.
import cv2
img = cv2.imread('newF.png')
mser = cv2.MSER_create()
img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()
regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0))
How can I stitch the images that belong to a single line together? I get the logic to do will mostly be based on some heuristic for identifying areas with nearby y-coordinates.
But how exactly the regions can be merged in OpenCV. I am missing out on this as I am new to openCV. Any help would be appreciated.
Attaching a sample image
The desired output(s) is as follows
Another line
Another Line

If you are particular about using MSER, then, as you mentioned, a heuristic for combining areas with nearby y-coordinates can be used. The following approach might not be efficient, and I will try and optimize it, but it might give you an idea about how to tackle the problem.
First, let us plot all the bboxes determined by MSER:
coordinates, bboxes = mser.detectRegions(gray)
for bbox in bboxes:
x, y, w, h = bbox
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
This gives us -
Now, it is evident from the bboxes, that the heights are varying quite a lot, even in a single line. Thus, for clustering bounding bboxes in a single line, we would have to come up with an interval. I couldn't come up with something foolproof, so I went with half the median of all the heights of the given bboxes, which works well for the given case.
bboxes_list = list()
heights = list()
for bbox in bboxes:
x, y, w, h = bbox
bboxes_list.append([x, y, x + w, y + h]) # Create list of bounding boxes, with each bbox containing the left-top and right-bottom coordinates
heights.append(h)
heights = sorted(heights) # Sort heights
median_height = heights[len(heights) / 2] / 2 # Find half of the median height
Now, to group the bounding boxes, given a particular interval for the y-coordinates ( Here, the median height ), I am modifying a snippet that I had once found on stackoverflow ( I will add the source once I find it ). This function takes in a list, along with a specific interval as input, and returns a list of groups, where each group contains bounding boxes whose absolute difference in y-coordinates is less than or equal to the interval. Please note that the iterable / list needs to be sorted based on y-coordinate.
def grouper(iterable, interval=2):
prev = None
group = []
for item in iterable:
if not prev or abs(item[1] - prev[1]) <= interval:
group.append(item)
else:
yield group
group = [item]
prev = item
if group:
yield group
Thus, before grouping the bounding boxes, they need to be sorted based on the y-coordinate. After grouping, we iterate through each group, and determine the min x-coordinate, min y-coordinate, max x-coordinate, and max y-coordinate required to draw a bounding box that covers all the bounding boxes in a given group.
bboxes_list = sorted(bbox_mod, key=lambda k: k[1]) # Sort the bounding boxes based on y1 coordinate ( y of the left-top coordinate )
combined_bboxes = grouper(bboxes_list, median_height) # Group the bounding boxes
for group in combined_bboxes:
x_min = min(group, key=lambda k: k[0])[0] # Find min of x1
x_max = max(group, key=lambda k: k[2])[2] # Find max of x2
y_min = min(group, key=lambda k: k[1])[1] # Find min of y1
y_max = max(group, key=lambda k: k[3])[3] # Find max of y2
cv2.rectangle(img, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
Final resultant image -
Again, I would like to re-iterate the fact that their might be ways to optimize this approach further. The goal is to give you an idea about how such problems can be tackled.

Maybe even something as primitive as dilate-erode could be made work in your case? For example, if I use erode operation followed by dilate operation on your original image, and mostly in horizontal direction, e. g.:
img = cv2.erode(img, np.ones((1, 20)))
img = cv2.dilate(img, np.ones((1, 22)))
the result is something like:
So if we draw that over the original image, it becomes:
I didn't resize the original image as you do (probably to detect those small separate dots and stuff). Not ideal (I don't know how MSER works), but with enough tweaking maybe you could even use simple detection of connected components with this?

Related

How to split an image into multiple images based on white borders between them

I need to split an image into multiple images, based on the white borders between them.
for example:
output:
using Python, I don't know how to start this mission.
Here is a solution for the "easy" case where we know the grid configuration. I provide this solution even though I doubt this is what you were asked to do.
In your example image of the cat, if we are given the grid configuration, 2x2, we can do:
from PIL import Image
def subdivide(file, nx, ny):
im = Image.open(file)
wid, hgt = im.size # Size of input image
w = int(wid/nx) # Width of each subimage
h = int(hgt/ny) # Height of each subimage
for i in range(nx):
x1 = i*w # Horicontal extent...
x2 = x1+w # of subimate
for j in range(ny):
y1 = j*h # Certical extent...
y2 = y1+h # of subimate
subim = im.crop((x1, y1, x2, y2))
subim.save(f'{i}x{j}.png')
subdivide("cat.png", 2, 2)
The above will create these images:
My previous answer depended on knowing the grid configuration of the input image. This solution does not.
The main challenge is to detect where the borders are and, thus, where the rectangles that contain the images are located.
To detect the borders, we'll look for (vertical and horizontal) image lines where all pixels are "white". Since the borders in the image are not really pure white, we'll use a value less than 255 as the whiteness threshold (WHITE_THRESH in the code.)
The gist of the algorithm is in the following lines of code:
whitespace = [np.all(gray[:,i] > WHITE_THRESH) for i in range(gray.shape[1])]
Here "whitespace" is a list of Booleans that looks like
TTTTTFFFFF...FFFFFFFFTTTTTTTFFFFFFFF...FFFFTTTTT
where "T" indicates the corresponding horizontal location is part of the border (white).
We are interested in the x-locations where there are transitions between T and F. The call to the function slices(whitespace) returns a list of tuples of indices
[(x1, x2), (x1, x2), ...]
where each (x1, x2) pair indicates the xmin and xmax location of images in the x-axis direction.
The slices function finds the "edges" where there are transitions between True and False using the exclusive-or operator and then returns the locations of the transitions as a list of tuples (pairs of indices).
Similar code is used to detect the vertical location of borders and images.
The complete runnable code below takes as input the OP's image "cat.png" and:
Extracts the sub-images into 4 PNG files "fragment-0-0.png", "fragment-0-1.png", "fragment-1-0.png" and "fragment-1-1.png".
Creates a (borderless) version of the original image by pasting together the above fragments.
The runnable code and resulting images follow. The program runs in about 0.25 seconds.
from PIL import Image
import numpy as np
def slices(lst):
""" Finds the indices where lst changes value and returns them in pairs
lst is a list of booleans
"""
edges = [lst[i-1] ^ lst[i] for i in range(len(lst))]
indices = [i for i,v in enumerate(edges) if v]
pairs = [(indices[i], indices[i+1]) for i in range(0, len(indices), 2)]
return pairs
def extract(xx_locs, yy_locs, image, prefix="image"):
""" Locate and save the subimages """
data = np.asarray(image)
for i in range(len(xx_locs)):
x1,x2 = xx_locs[i]
for j in range(len(yy_locs)):
y1,y2 = yy_locs[j]
arr = data[y1:y2, x1:x2, :]
Image.fromarray(arr).save(f'{prefix}-{i}-{j}.png')
def assemble(xx_locs, yy_locs, prefix="image", result='composite'):
""" Paste the subimages into a single image and save """
wid = sum([p[1]-p[0] for p in xx_locs])
hgt = sum([p[1]-p[0] for p in yy_locs])
dst = Image.new('RGB', (wid, hgt))
x = y = 0
for i in range(len(xx_locs)):
for j in range(len(yy_locs)):
img = Image.open(f'{prefix}-{i}-{j}.png')
dst.paste(img, (x,y))
y += img.height
x += img.width
y = 0
dst.save(f'{result}.png')
WHITE_THRESH = 110 # The original image borders are not actually white
image_file = 'cat.png'
image = Image.open(image_file)
# To detect the (almost) white borders, we make a grayscale version of the image
gray = np.asarray(image.convert('L'))
# Detect location of images along the x axis
whitespace = [np.all(gray[:,i] > WHITE_THRESH) for i in range(gray.shape[1])]
xx_locs = slices(whitespace)
# Detect location of images along the y axis
whitespace = [np.all(gray[i,:] > WHITE_THRESH) for i in range(gray.shape[0])]
yy_locs = slices(whitespace)
extract(xx_locs, yy_locs, image, prefix='fragment')
assemble(xx_locs, yy_locs, prefix='fragment', result='composite')
Individual fragments:
The composite image:

Map corresponding points between Delaunay triangles

I'm trying to morph two images of faces using an inverse warp. I have the Delaunay triangles for both images as well as all transformation matrices for all pairs of corresponding triangles.
I have applied the matrix to every pixel inside the triangles, but the image I am getting is all messed up and some pixels aren't being filled in as well.
I suspect the vertices lists are not in order which means the triangles are not corresponding. Or it could just be me messing up the row, cols order.
Here's my code:
from scipy.spatial import Delaunay
from skimage.draw import polygon
import numpy as np
def drawDelaunay(img, landmarks, color):
tri = Delaunay(landmarks)
vertices = []
for t in landmarks[tri.simplices]:
# t = [int(i) for i in t]
pt1 = [t[0][0], t[0][1]]
pt2 = [t[1][0], t[1][1]]
pt3 = [t[2][0], t[2][1]]
cv2.line(img, pt1, pt2, color, 1, cv2.LINE_AA, 0)
cv2.line(img, pt2, pt3, color, 1, cv2.LINE_AA, 0)
cv2.line(img, pt3, pt1, color, 1, cv2.LINE_AA, 0)
vertices.append([pt1, pt2, pt3])
return img, vertices
def getAffineMat(triangle1, triangle2):
x = np.transpose(np.matrix([*triangle1]))
y = np.transpose(np.matrix([*triangle2]))
# Add ones to bottom of x and y
x = np.vstack((x, [1,1,1]))
y = np.vstack((y, [1,1,1]))
xInv = np.linalg.pinv(x)
return np.dot(y, xInv)
srcImg = face2
srcRows, srcCols, srcDepth = face2.shape
destImg = np.zeros(face1.shape, dtype=np.uint8)
for triangle1, triangle2 in zip(vertices1, vertices2):
transMat = getAffineMat(triangle1, triangle2)
r, c = list(map(list, zip(*triangle2)))
rr, cc = polygon(r, c)
for row, col in zip(rr, cc):
transformed = np.dot(transMat, [col, row, 1])
srcX, srcY, *_ = np.array(transformed.T)
# Check if pixel is within image boundaries
if isWithinBounds(srcCols, srcRows, col, row):
# Interpolate the color of the pixel from the four nearest pixels
color = bilinearInterpolation(srcImg, srcX, srcY)
# Set the color of the current pixel in the destination image
destImg[row, col] = color
I wish to implement this without getAffineTransform or warpAffine. Any help would be much appreciated!
Sources:
Transfer coordinates from one triangle to another triangle
https://devendrapratapyadav.github.io/FaceMorphing/
But you don't have corresponding triangles! This looks like 2 separates Delaunay triangulation. Maybe made on matching points, but still no matching triangles. You can't do two Delaunay triangulation, one in each image, and expect them to match. You need 1 delaunay triangulation, and then use the same edges on both sides (so, for at least one side, triangulation will not be exactly Delaunay).
Look for example at the top-right corner of your images.
On one side you have you have 4 outgoing edges (counting those we can't see because they are confused with te image border, but they have to be there), on the other you have 6 outgoing edges.
The number of edges connected to two matching vertices is supposed to be a constant (otherwise, how could you warp anything?).
So, clearly, I think (but you did not provide any code, for that, since you postulate that triangulation is correct, when I am pretty sure it is triangulation that is not. So I can only surmise), you got a two sets of matching points, then performed 2 Delaunay's triangulation on those 2 sets of points, expecting to be able to match triangles, even tho they are not at all the same triangles.
Edit: how to transform
(in reply to your question in comment)
It's the same triangulations. You have a list of points p₁, p₂, p₃, ..., pₙ in the first images. A matching list of points q₁, q₂, q₃, ..., qₙ in the second image. You perform a triangulation in the 1st image. Whose output should be a list of triplets of indices, such as (1,3,4), (1, 2, 3), ... meaning that optimal triangulation in 1st image is the one made of triangle (p₁,p₃, p₄), (p₁, p₂, p₃), ...
And in the second image, you use triangulation (q₁,q₃,q₄), (q₁, q₂, q₃), ...
Even if it is not the optimal triangulation of q₁,q₂,...,qₙ (the one that maximize smallest angle). It should not be that far, if q₁,q₂,...,qₙ are not that different from p₁,p₂,...,pₙ (which they are not supposed to be, if you tried to match consistently both images).
So, transformation matrices are the one transforming coordinates in each matching triangles (there are one transformation for each pair of matching triangles).
To decide which point (x',y') of second image matches point (x,y) of first image, you need
to identify in which triangle (i,j,k) (that is (pᵢ,pⱼ,pₖ)) (x,y) is,
Find barycentric coordinates of (x,y) inside this triangle: (x,y)=αpᵢ+βpⱼ+γpₖ
Assume that (x',y') have the same barycentric coordinates inside the matching triangle, that is (x',y')=αqᵢ+βqⱼ+γqₖ
Transformation matrix (for triangle (i,j,k)) is the one going from (x,y) to (x',y')

How can I automatically find pairs of the same object from two images acquired by different techniques?

I have one slide with several objects deposited on top. I have taken two images of the same slide using different techniques (binocular and SEM). This means that the images vary in the imaged area, focus, resolution and perhaps angle of view. Despite these, we can easily identify pairs that belong to the same object, and some which are unpaired due to binary images pre-processing.
For instance, in the horizontally stacked image below (see all objects at the higher resolution images binocular and SEM) can be seen that 49 (left) and 71 (right) are the same object, and similarly for the pairs 56-75, 53-72, etc. But some, like object 79 (right), have no respective pair.
My goal is to automatically detect the pairs that belong to the same object. For this I tried using the code below to, first, resize the images based on four coordinates, which I manually click (source), and which belong to the two ends of a known object pair. Secondly, to find similarities between each of the objects using Match Shapes, and sorting them in ascending order, since the pairs closer to 0 are those more similar according to the opencv function.
# Import libraries, declare variables and import images
import cv2
import numpy as np
import operator
import os
import math
coordinates = []
match = {}
size = 30
# Match contours and images
thr_sem = cv2.imread('https://i.stack.imgur.com/VC7GO.jpg', cv2.IMREAD_GRAYSCALE)
thr_bino = cv2.imread('https://i.stack.imgur.com/9ILWX.jpg', cv2.IMREAD_GRAYSCALE)
# Find coordinates for an object pair from both images
def click_event(event, x, y, flags, param):
if event == cv2.EVENT_LBUTTONDOWN:
print(x,",",y)
coordinates.append([x,y])
font = cv2.FONT_HERSHEY_SIMPLEX
strXY = str(x)+", "+str(y)
cv2.putText(thr_bino, strXY, (x,y), font, 3, (255,255,0), 2)
cv2.imshow("Bino thumbnail", thr_bino)
if event == cv2.EVENT_RBUTTONDOWN:
blue = thr_bino[y, x, 0]
green = thr_bino[y, x, 1]
red = thr_bino[y, x, 2]
font = cv2.FONT_HERSHEY_SIMPLEX
strBGR = str(blue)+", "+str(green)+","+str(red)
cv2.putText(thr_bino, strBGR, (x,y), font, 3, (0,255,255), 2)
cv2.imshow("Bino thumbnail", thr_bino)
cv2.imshow("SEM original", thr_sem)
#calling the mouse click event
cv2.setMouseCallback("Bino thumbnail", click_event)
cv2.setMouseCallback("SEM original", click_event)
cv2.waitKey(0)
cv2.destroyAllWindows()
#calling the mouse click event
return coordinates
def call_click():
cv2.imshow("Bino thumbnail", thr_bino)
cv2.imshow("SEM original", thr_sem)
#calling the mouse click event
cv2.setMouseCallback("Bino thumbnail", click_event)
cv2.setMouseCallback("SEM original", click_event)
cv2.waitKey(0)
cv2.destroyAllWindows()
print(coordinates)
# Resize based on coordinates
def resize(coordinates):
coordinates_list = [item for sublist in coordinates for item in sublist]
print(coordinates_list)
small = coordinates_list[:4]
x1,y1,x2,y2 = tuple(small)
h_small = y2-y1
big = coordinates_list[4:]
x1,y1,x2,y2 = tuple(big)
h_big = y2-y1
ratio = h_big/h_small
h,w = thr_sem.shape
dim = (int(w/ratio),int(h/ratio))
resized_sem = cv2.resize(thr_sem, dim, interpolation = cv2.INTER_AREA)
return resized_sem
# Find matches based on cv2.matchShapes
def find_and_export_matches(thr_sem, thr_bino):
resized_sem = resize(coordinates)
prelim_sem, hier = cv2.findContours(resized_sem, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
prelim_bino, hier = cv2.findContours(thr_bino, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnts_sem = [contour for contour in prelim_sem if len(contour) > size]
cnts_bino = [contour for contour in prelim_bino if len(contour) > size]
for idx, cnt_sem in enumerate(cnts_sem):
for iddx, cnt_bino in enumerate(cnts_bino):
ret = cv2.matchShapes(cnt_sem,cnt_bino,1,0.0)
match[f'sem{idx}-bino{iddx}'] = ret
matches = {k:v for (k,v) in match.items() if v<0.5}
matches = dict(sorted(matches.items(), key=operator.itemgetter(1),reverse=False))
print(matches)
# Export the more similar images (e.g. 5 here)
for idx, (key, value) in enumerate(list(matches.items())[:5]):
sem_idx = key.split('-')[0]
sem_idx = int(re.findall("\d+", sem_idx)[0])
bino_idx = key.split('-')[1]
bino_idx = int(re.findall("\d+", bino_idx)[0])
sem_cnt = cnts_sem[sem_idx]
bino_cnt = cnts_bino[bino_idx]
ext_left = tuple(sem_cnt[sem_cnt[:, :, 0].argmin()][0])
ext_right = tuple(sem_cnt[sem_cnt[:, :, 0].argmax()][0])
ext_top = tuple(sem_cnt[sem_cnt[:, :, 1].argmin()][0])
ext_bot = tuple(sem_cnt[sem_cnt[:, :, 1].argmax()][0])
margin = int((ext_bot[1] - ext_top[1])/10)
try:
cropped_sem = thr_sem[ext_top[1]-margin:ext_bot[1]+margin,
ext_left[0]-margin:ext_right[0]+margin]
cv2.imwrite(f'./match/{idx}_sem.jpg', cropped_sem)
ext_left = tuple(bino_cnt[bino_cnt[:, :, 0].argmin()][0])
ext_right = tuple(bino_cnt[bino_cnt[:, :, 0].argmax()][0])
ext_top = tuple(bino_cnt[bino_cnt[:, :, 1].argmin()][0])
ext_bot = tuple(bino_cnt[bino_cnt[:, :, 1].argmax()][0])
margin = int((ext_bot[1] - ext_top[1])/6)
cropped_bino = thr_bino[ext_top[1]-margin:ext_bot[1]+margin,
ext_left[0]-margin:ext_right[0]+margin]
cv2.imwrite(f'./match/{idx}_bino.jpg', cropped_bino)
except:
pass
def process():
call_click()
find_and_export_matches(thr_sem, thr_bino)
process()
Unfortunately, the pairs closer to 0
'sem57-bino43': 0.0016109668185579906, 'sem8-bino56': 0.003940798999951367, 'sem83-bino46': 0.004857856469884181, 'sem33-bino20': 0.005706075224340301,'sem67-bino60': 0.0074210065120114965
don't belong to the same object:
I would appreciate help to detect the pairs that belong to the same object, either by suggesting a technique I did not come across, or by solving the task itself. Something I did not exploit is the fact that the geometrical relations from the particles between one image and the other should be similar. I looked into "image overlaying" processes elsewhere for that purpose but unsuccessfully.
My suggestion:
Unless your example is not representative, the two images are fairly well pre-aligned. So in the first place, you should only look for a matching shape in small neighborhoods of the other image*.
Anyway, only the matching of irregular shapes, in particular very different from round, is reliable. So I would consider a shape similarity metric, and for every shape use the best match but also the second best. If the two matching scores are very close, you can assume that recognition was not safe.
Now sort the parts by decreasing "safeness" and pick the first two, three or four. Using these, you can compute a similarity, affine or homographic transformation that maps one image to the other. This way you can very easily predict the correspondences between any point of both images, and assist further matching or just detect overlaps.
*If not, use a scale-invariant matching method over the whole image and proceed as above.

Finding the darkest region in a depth map using numpy and/or cv2

I am attempting to consistently find the darkest region in a series of depth map images generated from a video. The depth maps are generated using the PyTorch implementation here
Their sample run script generates a prediction of the same size as the input where each pixel is a floating point value, with the highest/brightest value being the closest. Standard depth estimation using ConvNets.
The depth prediction is then normalized as follows to make a png for review
bits = 2
depth_min = prediction.min()
depth_max = prediction.max()
max_val = (2**(8*bits))-1
out = max_val * (prediction - depth_min) / (depth_max - depth_min)
I am attempting to identify the darkest region in each image in the video, with the assumption that this region has the most "open space".
I've tried several methods:
cv2 template matching
Using cv2 template matching and minMaxLoc I created a template of np.zeros(100,100), then applied the template similar to the docs
img2 = out.copy().astype("uint8")
template = np.zeros((100, 100)).astype("uint8")
w, h = template.shape[::-1]
res = cv2.matchTemplate(img2,template,cv2.TM_SQDIFF)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
top_left = min_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
val = out.max()
cv2.rectangle(out,top_left, bottom_right, int(val) , 2)
As you can see, this implementation is very inconsistent with many false positives
np.argmin
Using np.argmin(out, axis=1) which generates many indices. I take the first two, and write the word MIN at those coordinates
text = "MIN"
textsize = cv2.getTextSize(text, font, 1, 2)[0]
textX, textY = np.argmin(prediction, axis=1)[:2]
cv2.putText(out, text, (textX, textY), font, 1, (int(917*max_val), int(917*max_val), int(917*max_val)), 2)
This is less inconsistent but still lacking
np.argwhere
Using np.argwhere(prediction == np.min(preditcion) then write the word MIN at the coordanites. I imagined this would give me the darkest pixel on the image, but this is not the case
I've also thought of running a convolution operation with a kernel of 50x50, then taking the region with the smallest value as the darkest region
My question is why are there inconsistencies and false positives. How can I fix that? Intuitively this seems like a very simple thing to do.
UPDATE
Thanks to Hans for the idea. Please follow this link to download the output depths in png format.
The minimum is not a single point but as a rule a larger area. argmin finds the first x and y (top left corner) of this area:
In case of multiple occurrences of the minimum values, the indices
corresponding to the first occurrence are returned.
What you need is the center of this minimum region. You can find it using moments. Sometimes you have multiple minimum regions for instance in frame107.png. In this case we take the biggest one by finding the contour with the largest area.
We still have some jumping markers as sometimes you have a tiny area that is the minimum, e.g. in frame25.png. Therefore we use a minimum area threshold min_area, i.e. we don't use the absolute minimum region but the region with the smallest value from all regions greater or equal that threshold.
import numpy as np
import cv2
import glob
min_area = 500
for file in glob.glob("*.png"):
img = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
for i in range(img.min(), 255):
if np.count_nonzero(img==i) >= min_area:
b = np.where(img==i, 1, 0).astype(np.uint8)
break
contours,_ = cv2.findContours(b, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
max_contour = max(contours, key=cv2.contourArea)
m = cv2.moments(max_contour)
x = int(m["m10"] / m["m00"])
y = int(m["m01"] / m["m00"])
out = cv2.circle(img, (x,y), 10, 255, 2 )
cv2.imwrite(file,out)
frame107 with five regions where the image is 0 shown with enhanced gamma:
frame25 with very small min region (red arrow), we take the fifth largest min region instead (white cirle):
The result (for min_area=500) is still a bit jumpy at some places, but if you further increase min_area you'll get false results for frames with a very steeply descending (and hence small per value) dark area. Maybe you can use the time axis (frame number) to filter out frames where the location of the darkest region jumps back and forth within 3 frames.

Connect the nearest points in segment and label segment

I using Open CV and skimage for document analysis of datasheets.
I am trying to segment out the shade region separately .
I am currently able to segment out the part and number as different clusters.
Using felzenszwalb() from skimage I segment the parts:
import matplotlib.pyplot as plt
import numpy as np
from skimage.segmentation import felzenszwalb
from skimage.io import imread
img = imread('test.jpg')
segments_fz = felzenszwalb(img, scale=100, sigma=0.2, min_size=50)
print("Felzenszwalb number of segments {}".format(len(np.unique(segments_fz))))
plt.imshow(segments_fz)
plt.tight_layout()
plt.show()
But not able to connect them. Any idea to connect methodically and label out the corresponding segment with part and part number would of great help .
Thanks in advance for your time – if I’ve missed out anything, over- or under-emphasised a specific point let me know in the comments.
Preliminaries
Some preliminary code:
%matplotlib inline
%load_ext Cython
import numpy as np
import cv2
from matplotlib import pyplot as plt
import skimage as sk
import skimage.morphology as skm
import itertools
def ShowImage(title,img,ctype):
plt.figure(figsize=(20, 20))
if ctype=='bgr':
b,g,r = cv2.split(img) # get b,g,r
rgb_img = cv2.merge([r,g,b]) # switch it to rgb
plt.imshow(rgb_img)
elif ctype=='hsv':
rgb = cv2.cvtColor(img,cv2.COLOR_HSV2RGB)
plt.imshow(rgb)
elif ctype=='gray':
plt.imshow(img,cmap='gray')
elif ctype=='rgb':
plt.imshow(img)
else:
raise Exception("Unknown colour type")
plt.axis('off')
plt.title(title)
plt.show()
For reference, here's your original image:
#Read in image
img = cv2.imread('part.jpg')
ShowImage('Original',img,'bgr')
Identifying Numbers
To simplify things, we'll want to classify pixels as being either on or off. We can do so with thresholding. Since our image contains two clear classes of pixels (black and white), we can use Otsu's method. We'll invert the colour scheme since the libraries we're using consider black pixels boring and white pixels interesting.
#Convert image to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
#Apply Otsu's method to eliminate pixels of intermediate colour
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
ShowImage('Applying Otsu',thresh,'gray')
#Verify that pixels are either black or white and nothing in between
np.unique(thresh)
Our strategy will be to locate numbers and then follow the line(s) near them to parts and then to label those parts. Since, conveniently, all of the Arabic numerals are formed from contiguous pixels, we can start by finding the connected components.
ret, components = cv2.connectedComponents(thresh)
#Each component is a different colour
ShowImage('Connected Components', components, 'rgb')
We can then filter the connected components to find the numbers by filtering for dimension. Note that this is not a super robust method of doing this. A better option would be to use character recognition, but this is left as an exercise to the reader :-)
class Box:
def __init__(self,x0,x1,y0,y1):
self.x0, self.x1, self.y0, self.y1 = x0,x1,y0,y1
def overlaps(self,box2,tol):
if self.x0 is None or box2.x0 is None:
return False
return not (self.x1+tol<=box2.x0 or self.x0-tol>=box2.x1 or self.y1+tol<=box2.y0 or self.y0-tol>=box2.y1)
def merge(self,box2):
self.x0 = min(self.x0,box2.x0)
self.x1 = max(self.x1,box2.x1)
self.y0 = min(self.y0,box2.y0)
self.y1 = max(self.y1,box2.y1)
box2.x0 = None #Used to mark `box2` as being no longer valid. It can be removed later
def dist(self,x,y):
#Get center point
ax = (self.x0+self.x1)/2
ay = (self.y0+self.y1)/2
#Get distance to center point
return np.sqrt((ax-x)**2+(ay-y)**2)
def good(self):
return not (self.x0 is None)
def ExtractComponent(original_image, component_matrix, component_number):
"""Extracts a component from a ConnectedComponents matrix"""
#Create a true-false matrix indicating if a pixel is part of a particular component
is_component = component_matrix==component_number
#Find the coordinates of those pixels
coords = np.argwhere(is_component)
# Bounding box of non-black pixels.
y0, x0 = coords.min(axis=0)
y1, x1 = coords.max(axis=0) + 1 # slices are exclusive at the top
# Get the contents of the bounding box.
return x0,x1,y0,y1,original_image[y0:y1, x0:x1]
numbers_img = thresh.copy() #This is used purely to show that we can identify numbers
numbers = []
for component in range(components.max()):
tx0,tx1,ty0,ty1,this_component = ExtractComponent(thresh, components, component)
#ShowImage('Component #{0}'.format(component), this_component, 'gray')
cheight, cwidth = this_component.shape
#print(cwidth,cheight) #Enable this to see dimensions
#Identify numbers based on aspect ratio
if (abs(cwidth-14)<3 or abs(cwidth-7)<3) and abs(cheight-24)<3:
numbers_img[ty0:ty1,tx0:tx1] = 128
numbers.append(Box(tx0,tx1,ty0,ty1))
ShowImage('Numbers', numbers_img, 'gray')
We now connect the numbers into contiguous blocks by expanding their bounding boxes slightly and looking for overlaps.
#This is kind of a silly way to do this, but it will work find for small quantities (hundreds)
merged=True #If true, then a merge happened this round
while merged: #Continue until there are no more mergers
merged=False #Reset merge indicator
for a,b in itertools.combinations(numbers,2): #Consider all pairs of numbers
if a.overlaps(b,10): #If this pair overlaps
a.merge(b) #Merge it
merged=True #Make a note that we've merged
numbers = [x for x in numbers if x.good()] #Eliminate those boxes that were gobbled by the mergers
#This is used purely to show that we can identify numbers
numbers_img = thresh.copy()
for n in numbers:
numbers_img[n.y0:n.y1,n.x0:n.x1] = 128
thresh[n.y0:n.y1,n.x0:n.x1] = 0 #Drop numbers from thresholded image
ShowImage('Numbers', numbers_img, 'gray')
Okay, so now we've identified the numbers! We'll use these later to identify parts.
Identifying Arrows
Next, we'll want to figure out what parts the numbers are pointing to. To do so, we want to detect lines. The Hough transform is good for this. To reduce the number of false positives, we skeletonize the data, which transforms it into a representation which is at most one pixel wide.
skel = sk.img_as_ubyte(skm.skeletonize(thresh>0))
ShowImage('Skeleton', skel, 'gray')
Now we perform the Hough transform. We're looking for one that identifies all of the lines going from the numbers to the parts. Getting this right may take some fiddling with the parameters.
lines = cv2.HoughLinesP(
skel,
1, #Resolution of r in pixels
np.pi / 180, #Resolution of theta in radians
30, #Minimum number of intersections to detect a line
None,
80, #Min line length
10 #Max line gap
)
lines = [x[0] for x in lines]
line_img = thresh.copy()
line_img = cv2.cvtColor(line_img, cv2.COLOR_GRAY2BGR)
for l in lines:
color = tuple(map(int, np.random.randint(low=0, high=255, size=3)))
cv2.line(line_img, (l[0], l[1]), (l[2], l[3]), color, 3, cv2.LINE_AA)
ShowImage('Lines', line_img, 'bgr')
We now want to find the line or lines which are closest to each number and retain only these. We're essentially filtering out all of the lines which are not arrows. To do so, we compare the end points of each line to the center point of each number box.
comp_labels = np.zeros(img.shape[0:2], dtype=np.uint8)
for n_idx,n in enumerate(numbers):
distvals = []
for i,l in enumerate(lines):
#Distances from each point of line to midpoint of rectangle
dists = [n.dist(l[0],l[1]),n.dist(l[2],l[3])]
#Minimum distance and the end point (0 or 1) of the line associated with that point
#Tuples of (Line Number, Line Point, Dist to Line Point) are produced
distvals.append( (i,np.argmin(dists),np.min(dists)) )
#Sort by distance between the number box and the line
distvals = sorted(distvals, key=lambda x: x[2])
#Include nearby lines, not just the closest one. This accounts for forking.
distvals = [x for x in distvals if x[2]<1.5*distvals[0][2]]
#Draw a white rectangle where the number box was
cv2.rectangle(comp_labels, (n.x0,n.y0), (n.x1,n.y1), 1, cv2.FILLED)
#Draw white lines where the arrows are
for dv in distvals:
l = lines[dv[0]]
lp = (l[0],l[1]) if dv[1]==0 else (l[2],l[3])
cv2.line(comp_labels, (l[0], l[1]), (l[2], l[3]), 1, 3, cv2.LINE_AA)
cv2.line(comp_labels, (lp[0], lp[1]), ((n.x0+n.x1)//2, (n.y0+n.y1)//2), 1, 3, cv2.LINE_AA)
ShowImage('Lines', comp_labels, 'gray')
Finding Parts
This part was hard! We now want to segment the parts in the image. If there was some way to disconnect the lines linking subparts together, this would be easy. Unfortunately, the lines connecting the subparts are the same width as many of the lines which constitute the parts.
To work around this, we could use a lot of logic. It would be painful and error-prone.
Alternatively, we could assume you have an expert-in-the-loop. This expert's sole job is to cut the lines connecting the subparts. This should be both easy and fast for them. Labeling everything would be slow and sad for humans, but is fast for computers. Separating things is easy for humans, but hard for computers. So we let both do what they do best.
In this case, you could probably train someone to do this job in a few minutes, so a true "expert" isn't really necessary. Just a mildly competent human.
If you pursue this, you'll need to write the expert in the loop tool. To do so, save the skeleton images, have your expert modify them, and read the skeletonized images back in. Like so.
#Save the image, or display it on a GUI
#cv2.imwrite("/z/skel.png", skel);
#EXPERT DOES THEIR THING HERE
#Read the expert-mediated image back in
skelhuman = cv2.imread('/z/skel.png')
#Convert back to the form we need
skelhuman = cv2.cvtColor(skelhuman,cv2.COLOR_BGR2GRAY)
ret, skelhuman = cv2.threshold(skelhuman,0,255,cv2.THRESH_OTSU)
ShowImage('SkelHuman', skelhuman, 'gray')
Now that we have the parts separated, we'll eliminate as much of the arrows as possible. We've already extracted these above, so we can add them back later if we need to.
To eliminate the arrows, we'll find all of the lines that terminate in locations other than by another line. That is, we'll locate pixels which have only one neighbouring pixel. We'll then eliminate the pixel and look at its neighbour. Doing this iteratively eliminates the arrows. Since I don't know another term for it, I'll call this a Fuse Transform. Since this will require manipulating individual pixels, which would be super slow in Python, we'll write the transform in Cython.
%%cython -a --cplus
import cython
from libcpp.queue cimport queue
import numpy as np
cimport numpy as np
#cython.boundscheck(False)
#cython.wraparound(False)
#cython.nonecheck(False)
#cython.cdivision(True)
cpdef void FuseTransform(unsigned char [:, :] image):
# set the variable extension types
cdef int c, x, y, nx, ny, width, height, neighbours
cdef queue[int] q
# grab the image dimensions
height = image.shape[0]
width = image.shape[1]
cdef int dx[8]
cdef int dy[8]
#Offsets to neighbouring cells
dx[:] = [-1,-1,0,1,1,1,0,-1]
dy[:] = [0,-1,-1,-1,0,1,1,1]
#Find seed cells: those with only one neighbour
for y in range(1, height-1):
for x in range(1, width-1):
if image[y,x]==0: #Seed cells cannot be blank cells
continue
neighbours = 0
for n in range(0,8): #Looks at all neighbours
nx = x+dx[n]
ny = y+dy[n]
if image[ny,nx]>0: #This neighbour has a value
neighbours += 1
if neighbours==1: #Was there only one neighbour?
q.push(y*width+x) #If so, this is a seed cell
#Starting with the seed cells, gobble up the lines
while not q.empty():
c = q.front()
q.pop()
y = c//width #Convert flat index into 2D x-y index
x = c%width
image[y,x] = 0 #Gobble up this part of the fuse
neighbour = -1 #No neighbours yet
for n in range(0,8): #Look at all neighbours
nx = x+dx[n] #Find coordinates of neighbour cells
ny = y+dy[n]
#If the neighbour would be off the side of the matrix, ignore it
if nx<0 or ny<0 or nx==width or ny==height:
continue
if image[ny,nx]>0: #Is the neighbouring cell active?
if neighbour!=-1: #If we've already found an active neighbour
neighbour=-1 #Then pretend we found no neighbours
break #And stop looking. This is the end of the fuse.
else: #Otherwise, make a note of the neighbour's index.
neighbour = ny*width+nx
if neighbour!=-1: #If there was only one neighbour
q.push(neighbour) #Continue burning the fuse
Back in standard Python:
#Apply the Fuse Transform
skh_dilated=skelhuman.copy()
FuseTransform(skh_dilated)
ShowImage('Fuse Transform', skh_dilated, 'gray')
Now that we've eliminated all of the arrows and lines connecting the parts, we dilate the remaining pixels a lot.
kernel = np.ones((3,3),np.uint8)
dilated = cv2.dilate(skh_dilated, kernel, iterations=6)
ShowImage('Dilation', dilated, 'gray')
Putting It All Together
And overlay the labels and arrows we segmented out earlier...
comp_labels_dilated = cv2.dilate(comp_labels, kernel, iterations=5)
labels_combined = np.uint8(np.logical_or(comp_labels_dilated,dilated))
ShowImage('Comp Labels', labels_combined, 'gray')
Finally, we take the merged number boxes, component arrows, and parts and color each of them using pretty colors from Color Brewer. We then overlay this on the original image to obtain the desired highlighting.
ret, labels = cv2.connectedComponents(labels_combined)
colormask = np.zeros(img.shape, dtype=np.uint8)
#Colors from Color Brewer
colors = [(228,26,28),(55,126,184),(77,175,74),(152,78,163),(255,127,0),(255,255,51),(166,86,40),(247,129,191),(153,153,153)]
for l in range(labels.max()):
if l==0: #Background component
colormask[labels==0] = (255,255,255)
else:
colormask[labels==l] = colors[l]
ShowImage('Comp Labels', colormask, 'bgr')
blended = cv2.addWeighted(img,0.7,colormask,0.3,0)
ShowImage('Blended', blended, 'bgr')
The final image
So, to recap, we identified numbers, arrows, and parts. In some cases, we were able to separate them automatically. In other cases, we used expert in the loop. Where we had to manipulate pixels individually, we used Cython for speed.
Of course, the danger with this sort of thing is that some other image will break the (many) assumptions I've made here. But that's a risk that you take when you try to use a single image to present a problem.

Categories