Do you know of an algorithm that can see that there is handwriting on an image? I am not interested in knowing what the handwriting says, but only that there is one present?
I have a video of someone filling a slide with handwriting. My goal is to determine how much of the slide has been filled with handwriting already.
The video in question can be downloaded here: http://www.filedropper.com/00_6
For this particular video, a great solution was already suggested in Quantify how much a slide has been filled with handwriting
The solution is based on summing the amount of the specific color used for the handwriting. However, if the handwriting is not in blue but any other color that can also be found on non-handwriting, this approach will not work.
Therefore, I am interested to know, if there exists a more general solution to determine if there is handwriting present on an image?
What I have done so far:
I was thinking of extracting the contours of an image, and then somehow detect the handwriting part based on how curvy the contours are (but I have no clue how to do that part). it might not be the best idea, though, as again it's not always correct...
import cv2
import matplotlib.pyplot as plt
img = cv2.imread(PATH TO IMAGE)
print("img shape=", img.shape)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow("image", gray)
cv2.waitKey(1)
#### extract all contours
# Find Canny edges
edged = cv2.Canny(gray, 30, 200)
cv2.waitKey(0)
# Finding Contours
# Use a copy of the image e.g. edged.copy()
# since findContours alters the image
contours, hierarchy = cv2.findContours(edged,
cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cv2.imshow('Canny Edges After Contouring', edged)
cv2.waitKey(0)
print("Number of Contours found = " + str(len(contours)))
# Draw all contours
# -1 signifies drawing all contours
cv2.drawContours(img, contours, -1, (0, 255, 0), 3)
cv2.imshow('Contours', img)
cv2.waitKey(0)
You can identify the space taken by hand-writing by masking the pixels from the template, and then do the same for the difference between further frames and the template. You can use dilation, opening, and thresholding for this.
Let's start with your template. Let's identify the parts we will mask:
import cv2
import numpy as np
template = cv2.imread('template.jpg')
Now, let's broaden the occupied pixels to make a zone that we will mask (hide) later:
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5),np.uint8)
dilation = cv2.dilate(255 - template, kernel,iterations = 5)
Then, we will threshold to turn this into a black and white mask:
_, thresh = cv2.threshold(dilation,25,255,cv2.THRESH_BINARY_INV)
In later frames, we will subtract this mask from the picture, by turning all these pixels to white. For instance:
import numpy as np
import cv2
vidcap = cv2.VideoCapture('0_0.mp4')
success,image = vidcap.read()
count = 0
frames = []
while count < 500:
frames.append(image)
success,image = vidcap.read()
count += 1
mask = np.where(thresh == 0)
example = frames[300]
example[mask] = [255, 255, 255]
cv2.imshow('', example)
cv2.waitKey(0)
Now, we will create a function that will return the difference between the template and a given picture. We will also use opening to get rid of the left over single pixels that would make it ugly.
def difference_with_mask(image):
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5), np.uint8)
dilation = cv2.dilate(255 - grayscale, kernel, iterations=5)
_, thresh = cv2.threshold(dilation, 25, 255, cv2.THRESH_BINARY_INV)
thresh[mask] = 255
closing = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
return closing
cv2.imshow('', difference_with_mask(frames[400]))
cv2.waitKey(0)
To address the fact that you don't want to have the hand detected as hand-writing, I suggest that instead of using the mask for every individual frame, you use the 95th percentile of the 15 last 30th frame... hang on. Look at this:
results = []
for ix, frame in enumerate(frames):
if ix % 30 == 0:
history.append(frame)
results.append(np.quantile(history, 0.95, axis=0))
print(ix)
Now, the example frame becomes this (the hand is removed because it wasn't mostly present in the 15 last 30th frames):
As you can see a little part of the hand-writing is missing. It will come later, because of the time-dependent percentile transformation we're doing. You'll see later: in my example with frame 18,400, the text that is missing in the image above is present. Then, you can use the function I gave you and this will be the result:
And here we go! Note that this solution, which doesn't include the hand, will take longer to compute because there's a few calculations needing to be done. Using just an image with no regard to the hand would calculate instantly, to the extent that you could probably run it on your webcam feed in real time.
Final Example:
Here's the frame 18,400:
Final image:
You can play with the function if you want the mask to wrap more thinly around the text:
Full code:
import os
import numpy as np
import cv2
vidcap = cv2.VideoCapture('0_0.mp4')
success,image = vidcap.read()
count = 0
from collections import deque
frames = deque(maxlen=700)
while count < 500:
frames.append(image)
success,image = vidcap.read()
count += 1
template = cv2.imread('template.jpg')
template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5),np.uint8)
dilation = cv2.dilate(255 - template, kernel,iterations = 5)
cv2.imwrite('dilation.jpg', dilation)
cv2.imshow('', dilation)
cv2.waitKey(0)
_, thresh = cv2.threshold(dilation,25,255,cv2.THRESH_BINARY_INV)
cv2.imwrite('thresh.jpg', thresh)
cv2.imshow('', thresh)
cv2.waitKey(0)
mask = np.where(thresh == 0)
example = frames[400]
cv2.imwrite('original.jpg', example)
cv2.imshow('', example)
cv2.waitKey(0)
example[mask] = 255
cv2.imwrite('example_masked.jpg', example)
cv2.imshow('', example)
cv2.waitKey(0)
def difference_with_mask(image):
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kernel = np.ones((5, 5), np.uint8)
dilation = cv2.dilate(255 - grayscale, kernel, iterations=5)
_, thresh = cv2.threshold(dilation, 25, 255, cv2.THRESH_BINARY_INV)
thresh[mask] = 255
closing = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
return closing
cv2.imshow('', difference_with_mask(frames[400]))
cv2.waitKey(0)
masked_example = difference_with_mask(frames[400])
cv2.imwrite('masked_example.jpg', masked_example)
from collections import deque
history = deque(maxlen=15)
results = []
for ix, frame in enumerate(frames):
if ix % 30 == 0:
history.append(frame)
results.append(np.quantile(history, 0.95, axis=0))
print(ix)
if ix > 500:
break
cv2.imshow('', frames[400])
cv2.waitKey(0)
cv2.imshow('', results[400].astype(np.uint8))
cv2.imwrite('percentiled_frame.jpg', results[400].astype(np.uint8))
cv2.waitKey(0)
cv2.imshow('', difference_with_mask(results[400].astype(np.uint8)))
cv2.imwrite('final.jpg', difference_with_mask(results[400].astype(np.uint8)))
cv2.waitKey(0)
You could try to make a template before detection which you could use to deduct it on the current frame of the video. One way you could make such a template is to iterate through every pixel of the frame and look-up if it has a higher value (white) in that coordinate than the value that is stored in the list.
Here is an example of such a template from your video by iterating through the first two seconds:
Once you have that it is simple to detect the text. You can use the cv2.absdiff() function to make difference of template and frame. Here is an example:
Once you have this image it is trivial to search for writting (threshold + contour search or something similar).
Here is an example code:
import numpy as np
import cv2
cap = cv2.VideoCapture('0_0.mp4') # read video
bgr = cap.read()[1] # get first frame
frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY) # transform to grayscale
template = frame.copy() # make a copy of the grayscale
h, w = frame.shape[:2] # height, width
matrix = [] # a list for [y, x] coordinares
# fill matrix with all coordinates of the image (height x width)
for j in range(h):
for i in range(w):
matrix.append([j, i])
fps = cap.get(cv2.CAP_PROP_FPS) # frames per second of the video
seconds = 2 # How many seconds of the video you wish to look the template for
k = seconds * fps # calculate how many frames of the video is in that many seconds
i = 0 # some iterator to count the frames
lowest = [] # list that will store highest values of each pixel on the fram - that will build our template
# store the value of the first frame - just so you can compare it in the next step
for j in matrix:
y = j[0]
x = j[1]
lowest.append(template[y, x])
# loop through the number of frames calculated before
while(i < k):
bgr = cap.read()[1] # bgr image
frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY) # transform to grayscale
# iterate through every pixel (pixels are located in the matrix)
for l, j in enumerate(matrix):
y = j[0] # x coordinate
x = j[1] # y coordinate
temp = template[y, x] # value of pixel in template
cur = frame[y, x] # value of pixel in the current frame
if cur > temp: # if the current frame has higher value change the value in the "lowest" list
lowest[l] = cur
i += 1 # increment the iterator
# just for vizualization
cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
i = 0 # new iteratir to increment position in the "lowest" list
template = np.ones((h, w), dtype=np.uint8)*255 # new empty white image
# iterate through the matrix and change the value of the new empty white image to that value
# in the "lowest" list
for j in matrix:
template[j[0], j[1]] = lowest[i]
i += 1
# just for visualization - template
cv2.imwrite("template.png", template)
cv2.imshow("template", template)
cv2.waitKey(0)
cv2.destroyAllWindows()
counter = 0 # counter of countours: logicaly if the number of countours would
# rapidly decrease than that means that a new template is in order
mean_compare = 0 # this is needed for a simple color checker if the contour is
# the same color as the oders
# this is the difference between the frame of the video and created template
while(cap.isOpened()):
bgr = cap.read()[1] # bgr image
frame = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY) # grayscale
img = cv2.absdiff(template, frame) # resulted difference
thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1] # thresholded image
kernel = np.ones((5, 5), dtype=np.uint8) # simple kernel
thresh = cv2.dilate(thresh, kernel, iterations=1) # dilate thresholded image
cnts, h = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # contour search
if len(cnts) < counter*0.5 and counter > 50: # check if new template is in order
# search for new template again
break
else:
counter = len(cnts) # update counter
for cnt in cnts: # iterate through contours
size = cv2.contourArea(cnt) # size of contours - to filter out noise
if 20 < size < 30000: # noise criterion
mask = np.zeros(frame.shape, np.uint8) # empry mask - needed for color compare
cv2.drawContours(mask, [cnt], -1, 255, -1) # draw contour on mask
mean = cv2.mean(bgr, mask=mask) # the mean color of the contour
if not mean_compare: # first will set the template color
mean_compare = mean
else:
k1 = 0.85 # koeficient how much each channels value in rgb image can be smaller
k2 = 1.15 # koeficient how much each channels value in rgb image can be bigger
# condition
b = bool(mean_compare[0] * k1 < mean[0] < mean_compare[0] * k2)
g = bool(mean_compare[1] * k1 < mean[1] < mean_compare[1] * k2)
r = bool(mean_compare[2] * k1 < mean[2] < mean_compare[2] * k2)
if b and g and r:
cv2.drawContours(bgr, [cnt], -1, (0, 255, 0), 2) # draw on rgb image
# just for visualization
cv2.imshow('img', bgr)
if cv2.waitKey(1) & 0xFF == ord('s'):
cv2.imwrite(str(j)+".png", img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# release the video object and destroy window
cap.release()
cv2.destroyAllWindows()
One possible result with a simple size and color filter:
NOTE: This template search algorithm is very slow because of the nested loops and can probably be optimized to make it faster - you need a little more math knowledge than me. Also, you will need to make a check if the template changes in the same video - I'm guessing that shouldn't be too difficult.
A simpler idea on how to make it a bit faster is to resize the frames to let's say 20% and make the same template search. After that resize it back to the original and dilate the template. It will not be as nice of a result but it will make a mask on where the text and lines of the template are. Then simply draw it over the frame.
I don't think you really need the code in this case and it would be rather long if you did. But here's an algorithm to do it.
Use OpenCV's EAST (Efficient Accurate Scene Text detector) model at the beginning to establish the starting text on the slide. That gives you a bounding box(es) of the initial percentage of the slide covered with slide text as opposed to handwritten explanatory text.
Every, say 1-5 seconds (people don't write all that fast), compare that baseline image with the current image and the previous image.
If the current image has more text than the previous image but the initial bounding boxes are NOT the same, you have a new and rather busy slide.
If the current image has more text than the previous image but the initial bounding boxes are ARE the same, more text is being added.
If the current image had less text than the previous image but the initial bounding boxes are NOT the same, you again have a new slide -- only, not busy and with space like the last one to write.
If the current image has less text than the previous image but the initial bounding boxes are ARE the same, you either have a duplicate slide with what will presumably be more text or the teacher is erasing a section to continue, or modify their explanation. Meaning, you'll need some way of addressing this.
When you have a new slide, take the previous image, and compare the bounding boxes of all text, subtracting the boxes for the initial state.
Computationally, this isn't going to be cheap (you certainly won't be doing this life, at least not for a number of years) but it's robust, and sampling the text every so many seconds of time will help.
Personally, I would approach this as an ensemble. That is an initial bounding box then look at the color of the text. If you can get away with the percentage of different color text, do. And when you can't, you'll still be good.
In addition to the great answers that people provided, I have two other suggestions.
The first one, is the CNN methods. It's totally workable to use some object detection routine, or even a segmentation method (like U-NET) to differentiate between the texts. It is easy because you can find millions of images from digital text books and also handwritten documents to train your model.
The Second approach is to locate and to extract every single symbol on the image, separately (with a simple method like the one you used so far, or with connectedcomponent). Since typographic letters and symbols have a unique shape and style (similar fonts - unlike the handwritten letters) you can match all the found letters with sample typographic letters that you gathered separately to distinguish between the handwritten and the typographic. Feature-point-based matching (like SURF) could be a good tool for this approach.
Related
I am trying to remove the checkered background (which represents transparent background in Adobe Illustrator and Photoshop) with transparent color (alpha channel) in some PNGs with Python script.
First, I use template matching:
import cv2
import numpy as np
from matplotlib import pyplot as plt
img_rgb = cv2.imread('testimages/fake1.png', cv2.IMREAD_UNCHANGED)
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
template = cv2.imread('pattern.png', 0)
w, h = template.shape[::-1]
res = cv2.matchTemplate(img_gray, template, cv2.TM_CCOEFF_NORMED)
threshold = 0.8
loc = np.where( res >= threshold)
for pt in zip(*loc[::-1]):
if len(img_rgb[0][0]) == 3:
# add alpha channel
rgba = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2RGBA)
rgba[:, :, 3] = 255 # default not transparent
img_rgb = rgba
# replace the area with a transparent rectangle
cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (255, 255, 255, 0), -1)
cv2.imwrite('result.png', img_rgb)
Source Image: fake1.png
Pattern Template: pattern.png
Output: result.png (the gray area is actually transparent; enlarge a bit for viewing easier)
I know this approach has problems, as the in some cases, the template cannot be identified fully, as part of the pattern is hidden by the graphics in the PNG image.
My question is: How can I match such a pattern perfectly using OpenCV? via FFT Filtering?
References:
How particular pixel to transparent in opencv python?
Detecting a pattern in an image and retrieving its position
https://python.plainenglish.io/how-to-remove-image-background-using-python-6f7ffa8eab15
https://answers.opencv.org/question/232506/make-the-background-of-the-image-transparent-using-a-mask/
https://dsp.stackexchange.com/questions/36679/which-image-filter-can-be-applied-to-remove-gridded-pattern-from-corrupt-jpegs
Here is one way to do that in Python/OpenCV simply by thresholding on the checks color range.
Input:
import cv2
import numpy as np
# read input
img = cv2.imread("fake.png")
# threshold on checks
low = (230,230,230)
high = (255,255,255)
mask = cv2.inRange(img, low, high)
# invert alpha
alpha = 255 - mask
# convert img to BGRA
result = cv2.cvtColor(img, cv2.COLOR_BGR2BGRA)
result[:,:,3] = alpha
# save output
cv2.imwrite('fake_transparent.png', result)
cv2.imshow('img', img)
cv2.imshow('mask', mask)
cv2.imshow('result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Download the resulting image to see that it is actually transparent.
Here is one way to use DFT to process the image in Python/OpenCV/Numpy. One does need to know the size of the checkerboard pattern (light or dark square size).
Read the input
Separate channels
Apply DFT to each channel
Shift origin from top left to center of each channel
Extract magnitude and phase images from each channel
Define the checkerboard pattern size
Create a black and white checkerboard image of the same size
Apply similar DFT processing to the checkerboard image
Get the spectrum from the log(magnitude)
Threshold the spectrum to form a mask
Zero out the DC center point in the mask
OPTION: If needed apply morphology dilate to thicken the white dots. But does not seem to be needed here
Invert the mask so the background is white and the dots are black
Convert the mask to range 0 to 1 and make 2 channels
Apply the two-channel mask to the center shifted DFT channels
Shift the center back to the top left in each masked image
Do the IDFT to get back from complex domain to real domain on each channel
Merge the resulting channels back to a BGR image as the final reconstituted image
Save results
Input:
import numpy as np
import cv2
import math
# read input
# note: opencv fft only works on grayscale
img = cv2.imread('fake.png')
hh, ww = img.shape[:2]
# separate channels
b,g,r = cv2.split(img)
# convert images to floats and do dft saving as complex output
dft_b = cv2.dft(np.float32(b), flags = cv2.DFT_COMPLEX_OUTPUT)
dft_g = cv2.dft(np.float32(g), flags = cv2.DFT_COMPLEX_OUTPUT)
dft_r = cv2.dft(np.float32(r), flags = cv2.DFT_COMPLEX_OUTPUT)
# apply shift of origin from upper left corner to center of image
dft_b_shift = np.fft.fftshift(dft_b)
dft_g_shift = np.fft.fftshift(dft_g)
dft_r_shift = np.fft.fftshift(dft_r)
# extract magnitude and phase images
mag_b, phase_b = cv2.cartToPolar(dft_b_shift[:,:,0], dft_b_shift[:,:,1])
mag_g, phase_g = cv2.cartToPolar(dft_g_shift[:,:,0], dft_g_shift[:,:,1])
mag_r, phase_r = cv2.cartToPolar(dft_r_shift[:,:,0], dft_r_shift[:,:,1])
# set check size (size of either dark or light square)
check_size = 15
# create checkerboard pattern
white = np.full((check_size,check_size), 255, dtype=np.uint8)
black = np.full((check_size,check_size), 0, dtype=np.uint8)
checks1 = np.hstack([white,black])
checks2 = np.hstack([black,white])
checks3 = np.vstack([checks1,checks2])
numht = math.ceil(hh / (2*check_size))
numwd = math.ceil(ww / (2*check_size))
checks = np.tile(checks3, (numht,numwd))
checks = checks[0:hh, 0:ww]
# apply dft to checkerboard pattern
dft_c = cv2.dft(np.float32(checks), flags = cv2.DFT_COMPLEX_OUTPUT)
dft_c_shift = np.fft.fftshift(dft_c)
mag_c, phase_c = cv2.cartToPolar(dft_c_shift[:,:,0], dft_c_shift[:,:,1])
# get spectrum from magnitude (add tiny amount to avoid divide by zero error)
spec = np.log(mag_c + 0.00000001)
# theshold spectrum
mask = cv2.threshold(spec, 1, 255, cv2.THRESH_BINARY)[1]
# mask DC point (center spot)
centx = int(ww/2)
centy = int(hh/2)
dot = np.zeros((3,3), dtype=np.uint8)
mask[centy-1:centy+2, centx-1:centx+2] = dot
# If needed do morphology dilate by small amount.
# But does not seem to be needed in this case
# invert mask
mask = 255 - mask
# apply mask to real and imaginary components
mask1 = (mask/255).astype(np.float32)
mask2 = cv2.merge([mask1,mask1])
complex_b = dft_b_shift*mask2
complex_g = dft_g_shift*mask2
complex_r = dft_r_shift*mask2
# shift origin from center to upper left corner
complex_ishift_b = np.fft.ifftshift(complex_b)
complex_ishift_g = np.fft.ifftshift(complex_g)
complex_ishift_r = np.fft.ifftshift(complex_r)
# do idft with normalization saving as real output and crop to original size
img_notch_b = cv2.idft(complex_ishift_b, flags=cv2.DFT_SCALE+cv2.DFT_REAL_OUTPUT)
img_notch_b = img_notch_b.clip(0,255).astype(np.uint8)
img_notch_b = img_notch_b[0:hh, 0:ww]
img_notch_g = cv2.idft(complex_ishift_g, flags=cv2.DFT_SCALE+cv2.DFT_REAL_OUTPUT)
img_notch_g = img_notch_g.clip(0,255).astype(np.uint8)
img_notch_g = img_notch_g[0:hh, 0:ww]
img_notch_r = cv2.idft(complex_ishift_r, flags=cv2.DFT_SCALE+cv2.DFT_REAL_OUTPUT)
img_notch_r = img_notch_r.clip(0,255).astype(np.uint8)
img_notch_r = img_notch_r[0:hh, 0:ww]
# combine b,g,r components
img_notch = cv2.merge([img_notch_b, img_notch_g, img_notch_r])
# write result to disk
cv2.imwrite("fake_checks.png", checks)
cv2.imwrite("fake_spectrum.png", (255*spec).clip(0,255).astype(np.uint8))
cv2.imwrite("fake_mask.png", mask)
cv2.imwrite("fake_notched.png", img_notch)
# show results
cv2.imshow("ORIGINAL", img)
cv2.imshow("CHECKS", checks)
cv2.imshow("SPECTRUM", spec)
cv2.imshow("MASK", mask)
cv2.imshow("NOTCH", img_notch)
cv2.waitKey(0)
cv2.destroyAllWindows()
Checkerboard image:
Spectrum of checkerboard:
Mask:
Result (notch filtered image):
The checkerboard pattern in the result is mitigated from the original, but still there upon close inspection.
From here one needs to threshold on the white background and invert to make an image for the alpha channel. Then convert the image to 4 BGRA and insert the alpha channel into the BGRA image as I described in my other answer below.
since you're working on PNG's with transparent backgrounds, it would probably be equally viable to instead of trying to detect the checkered background, you try to extract the stuff that isn't checkered. This could probably be achieved using a color check on all pixels. You could use opencv's inRange() function. I'll link a StackOverflow link below that tries to detect dark spots on a image.
Inrange example
kinda stuck trying to figure out how I can expand the background color inwards.
I have this image that has been generated through a mask after noisy background subtraction.
I am trying to make it into this:
So far I have tried to this, but to no avail:
import cv2
from PIL import Image
import numpy as np
Image.open("example_of_misaligned_frame.png") # open poor frame
img_copy = np.asanyarray(img).copy()
contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX) # find contours
# create bounding box around blob and figure out the row.cols to iterate over
x,y,w,h = cv2.boundingRect(max(contours, key = cv2.contourArea))
# flood fill the entire region with back, hoping that off-white region gets filled due to connected components.
for row in range(y, y+h):
for col in range(x, x+w):
cv2.floodFill(img_copy, None, seedPoint=(col,row), newVal=0)
This results in a completely black image :(
Any help, pointing me in the right direction, is greatly appreciated.
You can solve it by using floodFill twice:
First time - fill the black pixels with Off-White color.
Second time - fill the Off-White pixels with black color.
There is still an issue for finding the RGB values of the Off-White color.
I found an improvised solution for finding the Off-White color (I don't know the exact rules for what color is considered to be background).
Here is a working code sample:
import cv2
import numpy as np
#Image.open("example_of_misaligned_frame.png") # open poor frame
img = cv2.imread("example_of_misaligned_frame.png")
#img_copy = np.asanyarray(img).copy()
img_copy = img.copy()
#Improvised way to find the Off White color (it's working because the Off White has the maximum color components values).
tmp = cv2.dilate(img, np.ones((50,50), np.uint8), iterations=10)
# Color of Off-White pixel
offwhite = tmp[0, 0, :]
# Convert to tuple
offwhite = tuple((int(offwhite[0]), int(offwhite[1]), int(offwhite[2])))
# Fill black pixels with off-white color
cv2.floodFill(img_copy, None, seedPoint=(0,0), newVal=offwhite)
# Fill off-white pixels with black color
cv2.floodFill(img_copy, None, seedPoint=(0,0), newVal=0, loDiff=(2, 2, 2, 2), upDiff=(2, 2, 2, 2))
cv2.imshow("img_copy", img_copy)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result of cv2.dilate:
Result of first cv2.floodFill:
Result of second cv2.floodFill:
In Python/OpenCV, you can simply extract a binary mask from your flood filled process image and erode that mask. Then reapply it to the input or to your flood filled result.
Input:
import cv2
# read image
img = cv2.imread("masked_image.png")
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# make anything not black into white
gray[gray!=0] = 255
# erode mask
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (51,51))
mask = cv2.morphologyEx(gray, cv2.MORPH_ERODE, kernel)
# make mask into 3 channels
mask = cv2.merge([mask,mask,mask])
# apply new mask to img
result = img.copy()
result = cv2.bitwise_and(img, mask)
# write result to disk
cv2.imwrite("masked_image_original_mask.png", gray)
cv2.imwrite("masked_image_eroded_mask.png", mask)
cv2.imwrite("masked_image_eroded_image.png", result)
# display it
cv2.imshow("IMAGE", img)
cv2.imshow("MASK", mask)
cv2.imshow("RESULT", result)
cv2.waitKey(0)
Mask:
Eroded Mask:
Result:
Adjust the size of the circular (elliptical) morphology kernel as desired for more or less erosion.
Is there a good algorithm for detecting particles based on size? For example, I have the following image, and I want to detect and count particles that are highlighted in the below image:
It's basically based on size. The third circle from the left is about 20 microns and I want to count all the particles that meet that threshold.
How can I go about doing this? I need to do this for 10,000 images?
Thanks
As stated in the comment by Mark Setchell we can't measure 20 microns without a reference image. However, you have stated that we can take the third contour as a reference so I have first of all cropped that out after thresholding the image using
import cv2
img = cv2.imread('stars.png', 0)
ret, img = cv2.threshold(img,100,255,cv2.THRESH_BINARY)
cv2.imshow("image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.imwrite('star.png', img)
After cropping in paint
Now, it's area can be used as a reference for others. Both the original image and extracted star image are now read again and the area of the largest contour is calculated for the latter image. Then the contours for the original image are calculated after thresholding it and they are looped over. They are measured accordingly to the reference area. (I have taken as half the area obtained. You may have to perform parameter tuning for this following your other images as well before deciding on the best value.) Any contour satisfying the given condition is drawn on a result white image which is created using np.ones and has the same size as the original image and a counter variable is incremented. cv2.bitwise_and is used to obtain only the accepted contours and the total count is displayed.
Source code-:
import cv2
import numpy as np
img = cv2.imread('stars.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
match = cv2.imread('star.png', 0)
h, w = img.shape[:2]
res = np.ones((h, w), np.uint8)*255
ret, thresh = cv2.threshold(gray,64,255,cv2.THRESH_BINARY)
_, cnts, _ = cv2.findContours(match, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
cnt = max(cnts, key = cv2.contourArea)
area = cv2.contourArea(cnt)
_, cnts, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
count = 0
for cnt in cnts:
if cv2.contourArea(cnt)>=area/2:
cv2.drawContours(res, cnt, -1, 0, 2)
count += 1
res = cv2.bitwise_and(img, img, mask = cv2.bitwise_not(res))
cv2.imshow("res", res)
cv2.imshow("star", match)
cv2.imshow("image", img)
cv2.imshow("result", res)
cv2.waitKey(0)
cv2.destroyAllWindows()
print("total count is =", count)
This gives Output total count is = 3.
If you want to code it by yourself start with a simple thresholding: _,bin_img= cv2.threshold(img, 33,255,cv2.THRESH_BINARY). With from skimage.measure import label you can label connected components. Thresholding them by size should give you the desired results.
I'm writing a code that detects parakeets eyes. Currently, I'm using a already written code that i found on youtube. It's working great with the pictures that i have, but i don't know how to display a colored version of the selected area.
The results: (https://imgur.com/a/zCARrVC)
I've tried using masks and use cv2.drawcontourns to repeat the already drawn contour on them. It worked, but i couldn't make the mask overlap the original image and crop. I think it is because the contour wasn't filled, but i don't know for sure and i don't know if a filled contour won't mess up with the rest of the code.
import cv2
import numpy as np
import imutils
def nothing(x):
pass
# Load an image
img = cv2.imread('papagaio.png')
# Resize The image
if img.shape[1] > 600:
img = imutils.resize(img, width=600)
# Create a window
cv2.namedWindow('Treshed')
# create trackbars for treshold change
cv2.createTrackbar('Treshold','Treshed',0,255,nothing)
while(1):
# Clone original image to not overlap drawings
clone = img.copy()
# Convert to gray
gray = cv2.cvtColor(clone, cv2.COLOR_BGR2GRAY)
# get current positions of four trackbars
r = cv2.getTrackbarPos('Treshold','Treshed')
# Thresholding the gray image
ret,gray_threshed = cv2.threshold(gray,r,255,cv2.THRESH_BINARY)
# Blur an image
bilateral_filtered_image = cv2.bilateralFilter(gray_threshed, 5, 175, 175)
# Detect edges
edge_detected_image = cv2.Canny(bilateral_filtered_image, 75, 200)
# Find contours
contours, _= cv2.findContours(edge_detected_image, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contour_list = []
for contour in contours:
# approximte for circles
approx = cv2.approxPolyDP(contour,0.01*cv2.arcLength(contour,True),True)
area = cv2.contourArea(contour)
if ((len(approx) > 8) & (area > 30) ):
contour_list.append(contour)
# Draw contours on the original image
cv2.drawContours(clone, contour_list, -1, (255,0,0), 2)
# there is an outer boundary and inner boundary for each eadge, so contours double
print('Number of found circles: {}'.format(int(len(contour_list)/2)))
#Displaying the results
cv2.imshow('Objects Detected', clone)
cv2.imshow("Treshed", gray_threshed)
# ESC to break
k = cv2.waitKey(1) & 0xFF
if k == 27:
break
# close all open windows
cv2.destroyAllWindows()'
Like you said, you can create mask and then apply it on the RGB image. Here's some way to do it:
mask = np.zeros( (clone.shape[0], clone.shape[1]), np.uint8) #create single channel mask
for contour in contours:
cv.fillPoly(mask, pts=[contour], color=(255)) #cv.drawContours with thickness parameter = -1 should also work
cv.bitwise_and(clone, clone, mask)
I have around 100+ images with 2 different texts on it. The images are below. one is occupied and the other is unoccupied.
So is there any way in python to differentiate these images using some code to detect the text in it?
If so I wanted to identify the occupied images and delete unoccupied images.
Since I am new to python can anyone help me in doing this?
This answer is based on the assumption that that there are only two different texts on the images as you posted in the question. So I assume that the number of characters and the color of the text is always the same ("Room status: Unoccupied" and "Room status" Occupied" in red color). That being said, I would try a more simple way to differentiate between these two different types. These images contain caracters that are very near to each other so in my opinion is that it would be very difficult to seperate each character and identify it with an OCR. I would try a more simple approach like finding the area containg the text and find the pure lenght of the text - "unoccupied" has two more characters in the text as "occupied" and hence has a bigger distance in lenght. So you can transform the image to HSV color space and use the cv2.inRange() function to extract the text (red color). Then you can merge the characters to one contour with cv2.morphologyEx() and get its lenght with cv2.minAreaRect(). Hope it helps or at least gives you a new perspective on how to find your solution. Cheers!
Example code:
import cv2
import numpy as np
# Read the image and transform to HSV colorspace.
img = cv2.imread('ocupied.jpg')
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Extract the red text.
lower_red = np.array([0,150,50])
upper_red = np.array([40,255,255])
mask_red = cv2.inRange(hsv, lower_red, upper_red)
# Search for contours on the mask.
_, contours, hierarchy = cv2.findContours(mask_red,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
# Create a new mask for further processing.
mask = np.ones(img.shape, np.uint8)*255
# Draw contours on the mask with size and ratio of borders for threshold (to remove other noises from the image).
for cnt in contours:
size = cv2.contourArea(cnt)
x,y,w,h = cv2.boundingRect(cnt)
if 10000 > size > 50 and w*2.5 > h:
cv2.drawContours(mask, [cnt], -1, (0,0,0), -1)
# Connect neighbour contours and select the biggest one (text).
kernel = np.ones((50,50),np.uint8)
opening = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
gray_op = cv2.cvtColor(opening, cv2.COLOR_BGR2GRAY)
_, threshold_op = cv2.threshold(gray_op, 150, 255, cv2.THRESH_BINARY_INV)
_, contours_op, hierarchy_op = cv2.findContours(threshold_op, cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cnt = max(contours_op, key=cv2.contourArea)
# Create rotated rectangle to get the 4 points of the rectangle.
rect = cv2.minAreaRect(cnt)
# Create bounding and calculate the "lenght" of the text.
box = cv2.boxPoints(rect)
a, b, c, d = box = np.int0(box)
bound =[]
bound.append(a)
bound.append(b)
bound.append(c)
bound.append(d)
bound = np.array(bound)
(x1, y1) = (bound[:,0].min(), bound[:,1].min())
(x2, y2) = (bound[:,0].max(), bound[:,1].max())
# Draw the rectangle.
cv2.rectangle(img,(x1,y1),(x2,y2),(0,255,0),1)
# Identify the room status.
if x2 - x1 > 200:
print('unoccupied')
else:
print('occupied')
# Display the result
cv2.imshow('img', img)
Result:
occupied
unoccupied
Using the tesseract OCR Engine and the python wrapper pytesseract, it is only a few lines' task:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"
img = Image.open('D:\\tmp2.jpg').crop((0,0,250,35))
print(pytesseract.image_to_string(img, config='--psm 7'))
I have tested this on Windows 7. Of course, I have assumed that the text appears at the same position in every image (from your example, it does seem to be the case). Else, you need to find a better cropping mechanism.