I am working on project in which i supposed to detect defect in PCB i have already tried image subtraction method. It detects the defect missing components in PCB but it is not giving correct output even when PCB is not defect. it means it show that there
is missing component in PCB even the component is present
here is me code
g_o_img = cv2.cvtColor(o_img, cv2.COLOR_BGR2GRAY)
g_def_img = cv2.cvtColor(def_img, cv2.COLOR_BGR2GRAY)
score, diff = structural_similarity(g_o_img, g_def_img, full=True)
diff1 = (diff*255).astype("uint8")
cv2.imshow("diff", diff1)
ssim = format(score)
print("SSIM : "+ssim)
l_b = np.array([0])
u_b = np.array([50])
mask = cv2.inRange(diff1, l_b, u_b)
kernel = np.ones((3, 3), np.uint8)
erode = cv2.dilate(mask, kernel)
cv2.imshow("mask", mask)
cv2.imshow("erode", erode)
I have done perspective transformation and i am getting very good output but as my images are of different light illuminations so i am not getting proper output using image subtraction here is my code
img1 = cv2.imread("./pcb_images/orig3.png")
img1 = cv2.resize(img1, (640, 480))
image1 = img1[10:-10,10:-10]
print(image1)
g_o_img = cv2.cvtColor(image1, cv2.COLOR_BGR2LAB) [...,0]
img2 = cv2.imread("./pcb_images/defect4.png")
img2 = cv2.resize(img2, (640, 480))
image2 = img2[10:-10,10:-10]
g_def_img = cv2.cvtColor(image2, cv2.COLOR_BGR2LAB)[...,0]
thresh1 = cv2.threshold(g_o_img, 130, 255, cv2.THRESH_BINARY)[1]
thresh2 = cv2.threshold(g_def_img, 130, 255, cv2.THRESH_BINARY)[1]
res = cv2.bitwise_xor(thresh1, thresh2, mask=None)
You have to do some preprocessing before background subtraction. You have to detect the PCB only (without the background) and make sure the perspective is always the same. If the perspective is not the same do some Perspective Transformation to crop the PCB out of the image (Check here). Then convert the image to a Color Space that is more robust to light and shadows (Check here). After that you can do background subtraction followed by Thresholding (Check here )to get a binary image. Finally you can do Morphological Transformations to remove the false positives (Check here) and leave only the differences between the images .
Solution
This is the implementation of the approach explained above. You can experiment with different color spaces, threshold values or morphological kernel sizes to see which will give you better results.
import cv2
import numpy as np
kernel = np.ones((7,7),np.uint8)
img1 = cv2.imread("orig.png")
img1 = cv2.resize(img1, (640, 480))
image1 = img1[10:-10,10:-10]
img2 = cv2.imread("defect.png")
img2 = cv2.resize(img2, (640, 480))
image2 = img2[10:-10,10:-10]
#Changing color space
g_o_img = cv2.cvtColor(image1, cv2.COLOR_BGR2LAB) [...,0]
g_def_img = cv2.cvtColor(image2, cv2.COLOR_BGR2LAB)[...,0]
#Image subtraction
sub =cv2.subtract(g_o_img, g_def_img)
thresh = cv2.threshold(sub , 130, 255, cv2.THRESH_BINARY)[1]
#Morphological opening
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
#Detecting blobs
params = cv2.SimpleBlobDetector_Params()
params.filterByInertia = False
params.filterByConvexity = False
params.filterByCircularity = False
im=cv2.bitwise_not(opening)
detector = cv2.SimpleBlobDetector_create(params)
keypoints = detector.detect(im)
#Drawing circle around blobs
im_with_keypoints = cv2.drawKeypoints(img2, keypoints, np.array([]), (0,0,255), cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
#Display image with circle around defect
cv2.imshow('image',im_with_keypoints)
cv2.waitKey(0)
cv2.destroyAllWindows()
Output
Subtraction Result
Binary Image
Defect Image
I'm using OpenCV to detect Pneumonia in chest-x-ray using Image Processing, so I need to remove the attached area to the image border to get the lung only, can anyone help me code this in python?
This image explains what I want this image after applying this methods: resized, Histogram Equalization, otsu Thresholded and inverse binary Thresholded, morphological processes(opening then closing)
This is the Original Image Original Image
This is how I would approach the problem in Python/OpenCV. Add a white border all around, flood fill it with black to replace the white, then remove the extra border.
Input:
import cv2
import numpy as np
# read image
img = cv2.imread('lungs.jpg')
h, w = img.shape[:2]
# convert to gray
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# add 1 pixel white border all around
pad = cv2.copyMakeBorder(gray, 1,1,1,1, cv2.BORDER_CONSTANT, value=255)
h, w = pad.shape
# create zeros mask 2 pixels larger in each dimension
mask = np.zeros([h + 2, w + 2], np.uint8)
# floodfill outer white border with black
img_floodfill = cv2.floodFill(pad, mask, (0,0), 0, (5), (0), flags=8)[1]
# remove border
img_floodfill = img_floodfill[1:h-1, 1:w-1]
# save cropped image
cv2.imwrite('lungs_floodfilled.png',img_floodfill)
# show the images
cv2.imshow("img_floodfill", img_floodfill)
cv2.waitKey(0)
cv2.destroyAllWindows()
You can try using a morphological reconstruction with a border as a marker. This is an analogue of the function imclearborder from Matlab or Octave.
import cv2
import numpy as np
img = cv2.imread('5R0Zs.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 40, 255, cv2.THRESH_BINARY)[1]
kernel = np.ones((7,7),np.uint8)
kernel2 = np.ones((3,3),np.uint8)
marker = thresh.copy()
marker[1:-1,1:-1]=0
while True:
tmp=marker.copy()
marker=cv2.dilate(marker, kernel2)
marker=cv2.min(thresh, marker)
difference = cv2.subtract(marker, tmp)
if cv2.countNonZero(difference) == 0:
break
mask=cv2.bitwise_not(marker)
mask_color = cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR)
out=cv2.bitwise_and(img, mask_color)
cv2.imwrite('out.png', out)
cv2.imshow('result', out )
cv2.waitKey(0) # waits until a key is pressed
cv2.destroyAllWindows()
I have these images
For which I want to remove the text in the background. Only the captcha characters should remain(i.e K6PwKA, YabVzu). The task is to identify these characters later using tesseract.
This is what I have tried, but it isn't giving much good accuracy.
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Users\HPO2KOR\AppData\Local\Tesseract-OCR\tesseract.exe"
img = cv2.imread("untitled.png")
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray_filtered = cv2.inRange(gray_image, 0, 75)
cv2.imwrite("cleaned.png", gray_filtered)
How can I improve the same?
Note :
I tried all the suggestion that I was getting for this question and none of them worked for me.
EDIT :
According to Elias, I tried finding the color of the captcha text using photoshop by converting it to grayscale which came out to be somewhere in between [100, 105]. I then threshold the image based on this range. But the result which I got did not give satisfactory result from tesseract.
gray_filtered = cv2.inRange(gray_image, 100, 105)
cv2.imwrite("cleaned.png", gray_filtered)
gray_inv = ~gray_filtered
cv2.imwrite("cleaned.png", gray_inv)
data = pytesseract.image_to_string(gray_inv, lang='eng')
Output :
'KEP wKA'
Result :
EDIT 2 :
def get_text(img_name):
lower = (100, 100, 100)
upper = (104, 104, 104)
img = cv2.imread(img_name)
img_rgb_inrange = cv2.inRange(img, lower, upper)
neg_rgb_image = ~img_rgb_inrange
cv2.imwrite('neg_img_rgb_inrange.png', neg_rgb_image)
data = pytesseract.image_to_string(neg_rgb_image, lang='eng')
return data
gives :
and the text as
GXuMuUZ
Is there any way to soften it a little
Here are two potential approaches and a method to correct distorted text:
Method #1: Morphological operations + contour filtering
Obtain binary image. Load image, grayscale, then Otsu's threshold.
Remove text contours. Create a rectangular kernel with cv2.getStructuringElement() and then perform morphological operations to remove noise.
Filter and remove small noise. Find contours and filter using contour area to remove small particles. We effectively remove the noise by filling in the contour with cv2.drawContours()
Perform OCR. We invert the image then apply a slight
Gaussian blur. We then OCR using Pytesseract with the --psm 6 configuration option to treat the image as a single block of text. Look at Tesseract improve quality for other methods to improve detection and Pytesseract configuration options for additional settings.
Input image -> Binary -> Morph opening
Contour area filtering -> Invert -> Apply blur to get result
Result from OCR
YabVzu
Code
import cv2
import pytesseract
import numpy as np
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold
image = cv2.imread('2.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# Find contours and remove small noise
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 50:
cv2.drawContours(opening, [c], -1, 0, -1)
# Invert and apply slight Gaussian blur
result = 255 - opening
result = cv2.GaussianBlur(result, (3,3), 0)
# Perform OCR
data = pytesseract.image_to_string(result, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('result', result)
cv2.waitKey()
Method #2: Color segmentation
With the observation that the desired text to extract has a distinguishable contrast from the noise in the image, we can use color thresholding to isolate the text. The idea is to convert to HSV format then color threshold to obtain a mask using a lower/upper color range. From were we use the same process to OCR with Pytesseract.
Input image -> Mask -> Result
Code
import cv2
import pytesseract
import numpy as np
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, convert to HSV, color threshold to get mask
image = cv2.imread('2.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 0])
upper = np.array([100, 175, 110])
mask = cv2.inRange(hsv, lower, upper)
# Invert image and OCR
invert = 255 - mask
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
cv2.imshow('mask', mask)
cv2.imshow('invert', invert)
cv2.waitKey()
Correcting distorted text
OCR works best when the image is horizontal. To ensure that the text is in an ideal format for OCR, we can perform a perspective transform. After removing all the noise to isolate the text, we can perform a morph close to combine individual text contours into a single contour. From here we can find the rotated bounding box using cv2.minAreaRect and then perform a four point perspective transform using imutils.perspective.four_point_transform. Continuing from the cleaned mask, here's the results:
Mask -> Morph close -> Detected rotated bounding box -> Result
Output with the other image
Updated code to include perspective transform
import cv2
import pytesseract
import numpy as np
from imutils.perspective import four_point_transform
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, convert to HSV, color threshold to get mask
image = cv2.imread('1.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 0])
upper = np.array([100, 175, 110])
mask = cv2.inRange(hsv, lower, upper)
# Morph close to connect individual text into a single contour
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=3)
# Find rotated bounding box then perspective transform
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
rect = cv2.minAreaRect(cnts[0])
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(image,[box],0,(36,255,12),2)
warped = four_point_transform(255 - mask, box.reshape(4, 2))
# OCR
data = pytesseract.image_to_string(warped, lang='eng', config='--psm 6')
print(data)
cv2.imshow('mask', mask)
cv2.imshow('close', close)
cv2.imshow('warped', warped)
cv2.imshow('image', image)
cv2.waitKey()
Note: The color threshold range was determined using this HSV threshold script
import cv2
import numpy as np
def nothing(x):
pass
# Load image
image = cv2.imread('2.png')
# Create a window
cv2.namedWindow('image')
# Create trackbars for color change
# Hue is from 0-179 for Opencv
cv2.createTrackbar('HMin', 'image', 0, 179, nothing)
cv2.createTrackbar('SMin', 'image', 0, 255, nothing)
cv2.createTrackbar('VMin', 'image', 0, 255, nothing)
cv2.createTrackbar('HMax', 'image', 0, 179, nothing)
cv2.createTrackbar('SMax', 'image', 0, 255, nothing)
cv2.createTrackbar('VMax', 'image', 0, 255, nothing)
# Set default value for Max HSV trackbars
cv2.setTrackbarPos('HMax', 'image', 179)
cv2.setTrackbarPos('SMax', 'image', 255)
cv2.setTrackbarPos('VMax', 'image', 255)
# Initialize HSV min/max values
hMin = sMin = vMin = hMax = sMax = vMax = 0
phMin = psMin = pvMin = phMax = psMax = pvMax = 0
while(1):
# Get current positions of all trackbars
hMin = cv2.getTrackbarPos('HMin', 'image')
sMin = cv2.getTrackbarPos('SMin', 'image')
vMin = cv2.getTrackbarPos('VMin', 'image')
hMax = cv2.getTrackbarPos('HMax', 'image')
sMax = cv2.getTrackbarPos('SMax', 'image')
vMax = cv2.getTrackbarPos('VMax', 'image')
# Set minimum and maximum HSV values to display
lower = np.array([hMin, sMin, vMin])
upper = np.array([hMax, sMax, vMax])
# Convert to HSV format and color threshold
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, lower, upper)
result = cv2.bitwise_and(image, image, mask=mask)
# Print if there is a change in HSV value
if((phMin != hMin) | (psMin != sMin) | (pvMin != vMin) | (phMax != hMax) | (psMax != sMax) | (pvMax != vMax) ):
print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax , vMax))
phMin = hMin
psMin = sMin
pvMin = vMin
phMax = hMax
psMax = sMax
pvMax = vMax
# Display result image
cv2.imshow('image', result)
if cv2.waitKey(10) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
Your code produces better results than this. Here, I set a threshold for upperb and lowerb values based on histogram CDF values and a threshold. Press ESC button to get next image.
This code is unnecessarily complex and needs to be optimized in various ways. Code can be reordered to skip some steps. I kept it as some parts may help others. Some existing noise can be removed by keeping contour with area above certain threshold. Any suggestions on other noise reduction method is welcome.
Similar easier code for getting 4 corner points for perspective transform can be found here,
Accurate corners detection?
Code Description:
Original Image
Median Filter (Noise Removal and ROI Identification)
OTSU Thresholding
Invert Image
Use Inverted Black and White Image as Mask to keep mostly ROI part of original image
Dilation for largest Contour finding
Mark the ROI by drawing rectangle and corner points in original image
Straighten the ROI and extract it
Median Filter
OTSU Thresholding
Invert Image for mask
Mask the straight image to remove most noise further to text
In Range is used with lowerb and upperb values from histogram cdf as mentioned above to further reduce noise
Maybe eroding the image at this step will produce somewhat acceptable result. Instead here that image is dilated again and used as a mask to get less noisy ROI from perspective transformed image.
Code:
## Press ESC button to get next image
import cv2
import cv2 as cv
import numpy as np
frame = cv2.imread('extra/c1.png')
#frame = cv2.imread('extra/c2.png')
## keeping a copy of original
print(frame.shape)
original_frame = frame.copy()
original_frame2 = frame.copy()
## Show the original image
winName = 'Original'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)
## Apply median blur
frame = cv2.medianBlur(frame,9)
## Show the original image
winName = 'Median Blur'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)
#kernel = np.ones((5,5),np.uint8)
#frame = cv2.dilate(frame,kernel,iterations = 1)
# Otsu's thresholding
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
ret2,thresh_n = cv.threshold(frame,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
frame = thresh_n
## Show the original image
winName = 'Otsu Thresholding'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)
## invert color
frame = cv2.bitwise_not(frame)
## Show the original image
winName = 'Invert Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)
## Dilate image
kernel = np.ones((5,5),np.uint8)
frame = cv2.dilate(frame,kernel,iterations = 1)
##
## Show the original image
winName = 'SUB'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
img_gray = cv2.cvtColor(original_frame, cv2.COLOR_BGR2GRAY)
cv.imshow(winName, img_gray & frame)
cv.waitKey(0)
## Show the original image
winName = 'Dilate Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)
## Get largest contour from contours
contours, hierarchy = cv2.findContours(frame, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
## Get minimum area rectangle and corner points
rect = cv2.minAreaRect(max(contours, key = cv2.contourArea))
print(rect)
box = cv2.boxPoints(rect)
print(box)
## Sorted points by x and y
## Not used in this code
print(sorted(box , key=lambda k: [k[0], k[1]]))
## draw anchor points on corner
frame = original_frame.copy()
z = 6
for b in box:
cv2.circle(frame, tuple(b), z, 255, -1)
## show original image with corners
box2 = np.int0(box)
cv2.drawContours(frame,[box2],0,(0,0,255), 2)
cv2.imshow('Detected Corners',frame)
cv2.waitKey(0)
cv2.destroyAllWindows()
## https://stackoverflow.com/questions/11627362/how-to-straighten-a-rotated-rectangle-area-of-an-image-using-opencv-in-python
def subimage(image, center, theta, width, height):
shape = ( image.shape[1], image.shape[0] ) # cv2.warpAffine expects shape in (length, height)
matrix = cv2.getRotationMatrix2D( center=center, angle=theta, scale=1 )
image = cv2.warpAffine( src=image, M=matrix, dsize=shape )
x = int(center[0] - width / 2)
y = int(center[1] - height / 2)
image = image[ y:y+height, x:x+width ]
return image
## Show the original image
winName = 'Dilate Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
## use the calculated rectangle attributes to rotate and extract it
frame = subimage(original_frame, center=rect[0], theta=int(rect[2]), width=int(rect[1][0]), height=int(rect[1][1]))
original_frame = frame.copy()
cv.imshow(winName, frame)
cv.waitKey(0)
perspective_transformed_image = frame.copy()
## Apply median blur
frame = cv2.medianBlur(frame,11)
## Show the original image
winName = 'Median Blur'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)
#kernel = np.ones((5,5),np.uint8)
#frame = cv2.dilate(frame,kernel,iterations = 1)
# Otsu's thresholding
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
ret2,thresh_n = cv.threshold(frame,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
frame = thresh_n
## Show the original image
winName = 'Otsu Thresholding'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)
## invert color
frame = cv2.bitwise_not(frame)
## Show the original image
winName = 'Invert Image'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)
## Dilate image
kernel = np.ones((5,5),np.uint8)
frame = cv2.dilate(frame,kernel,iterations = 1)
##
## Show the original image
winName = 'SUB'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
img_gray = cv2.cvtColor(original_frame, cv2.COLOR_BGR2GRAY)
frame = img_gray & frame
frame[np.where(frame==0)] = 255
cv.imshow(winName, frame)
cv.waitKey(0)
hist,bins = np.histogram(frame.flatten(),256,[0,256])
cdf = hist.cumsum()
cdf_normalized = cdf * hist.max()/ cdf.max()
print(cdf)
print(cdf_normalized)
hist_image = frame.copy()
## two decresing range algorithm
low_index = -1
for i in range(0, 256):
if cdf[i] > 0:
low_index = i
break
print(low_index)
tol = 0
tol_limit = 20
broken_index = -1
past_val = cdf[low_index] - cdf[low_index + 1]
for i in range(low_index + 1, 255):
cur_val = cdf[i] - cdf[i+1]
if tol > tol_limit:
broken_index = i
break
if cur_val < past_val:
tol += 1
past_val = cur_val
print(broken_index)
##
lower = min(frame.flatten())
upper = max(frame.flatten())
print(min(frame.flatten()))
print(max(frame.flatten()))
#img_rgb_inrange = cv2.inRange(frame_HSV, np.array([lower,lower,lower]), np.array([upper,upper,upper]))
img_rgb_inrange = cv2.inRange(frame, (low_index), (broken_index))
neg_rgb_image = ~img_rgb_inrange
## Show the original image
winName = 'Final'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, neg_rgb_image)
cv.waitKey(0)
kernel = np.ones((3,3),np.uint8)
frame = cv2.erode(neg_rgb_image,kernel,iterations = 1)
winName = 'Final Dilate'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
#cv.resizeWindow(winName, 800, 800)
cv.imshow(winName, frame)
cv.waitKey(0)
##
winName = 'Final Subtracted'
cv.namedWindow(winName, cv.WINDOW_NORMAL)
img2 = np.zeros_like(perspective_transformed_image)
img2[:,:,0] = frame
img2[:,:,1] = frame
img2[:,:,2] = frame
frame = img2
cv.imshow(winName, perspective_transformed_image | frame)
cv.waitKey(0)
##
import matplotlib.pyplot as plt
plt.plot(cdf_normalized, color = 'b')
plt.hist(hist_image.flatten(),256,[0,256], color = 'r')
plt.xlim([0,256])
plt.legend(('cdf','histogram'), loc = 'upper left')
plt.show()
1. Median Filter:
2. OTSU Threshold:
3. Invert:
4. Inverted Image Dilation:
5. Extract by Masking:
6. ROI points for transform:
7. Perspective Corrected Image:
8. Median Blur:
9. OTSU Threshold:
10. Inverted Image:
11. ROI Extraction:
12. Clamping:
13. Dilation:
14. Final ROI:
15. Histogram plot of step 11 image:
Didn't try , but this might work.
step 1:
use ps to find out what color the captcha characters are. For excample, "YabVzu" is (128,128,128),
Step 2:
Use pillow's method getdata()/getcolor(), it will return a sequence which contain the colour of every pixel.
then ,we project every item in the sequence to the original captcha image.
hence we know the positon of every pixel in the image.
Step 3:
find all pixels whose colour with the most approximate values to (128,128,128).
You may set a threshold to control the accuracy. this step return another sequence.
Lets annotate it as Seq a
Step 4:
generate a picture with the very same height and width as the original one.
plot every pixel in [Seq a] in the very excat position in the picture. Here,we will get a cleaned training items
Step 5:
Use a Keras project to break the code. And the precission should be over 72%.
I have a drone FPV video, from which I need extract GPS coordinates. The text is white, but because of bad quality of video it seems gray and light blue. Since the background is changing, I have some problems, because in some frames the background has a totally different and in some frames a similar color to the text one.
Here is 2 original images (frames) from the video:
Dark background
Light background
And here is the code that I've found after googling:
import numpy as np
import cv2
import pytesseract
cap = cv2.VideoCapture('v1.avi')
p = 10000
while(cap.isOpened()):
ret, frame = cap.read()
img = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
img = img[380:460, 220:640]
img = cv2.bilateralFilter(img, 9, 27, 27)
img = cv2.threshold(img, 0, 255,
cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
img = cv2.GaussianBlur(img, (9, 9), 0)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
img = cv2.morphologyEx(img, cv2.MORPH_TOPHAT, kernel)
img = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)[1]
img = cv2.dilate(img, kernel)
img = cv2.threshold(img, 0, 250, cv2.THRESH_BINARY_INV)[1]
cv2.imshow('frame', img)
cv2.imshow('or', frame)
print('\n==============')
print(pytesseract.image_to_string(img, config='digits'))
if cv2.waitKey(50) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
And also the results:
Dark background
Light background
As you can see, in the second case the background isn't clear, there is some noise, and from that image Tesseract doesn't extract the text properly.
EDIT:
For some reasons I can't share the video I wrote about above, but here is a similar video from Youtube, and if the text can be extracted from that video, I guess that method will also work for mine or solve many problems at least:
I was able to get something working using a combination of cv2.bilateralFilter and cv2.adaptiveThreshold. Once the background is in one main blob, the numbers can be extracted based on their patch sizes.
img = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Bilaterial filter and adaptive histogram thresholding to get background into mostly one patch
img = cv2.bilateralFilter(img, 9, 29, 29)
thresh = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 13, 0)
# Add padding to join any background around edges into the same patch
pad = 2
img_pad = cv2.copyMakeBorder(thresh, pad, pad, pad, pad, cv2.BORDER_CONSTANT, value = 1)
# Label patches and remove padding
ret, markers = cv2.connectedComponents(img_pad)
markers = markers[pad:-pad,pad:-pad]
# Count pixels in each patch
counts = [(markers==i).sum() for i in range(markers.max()+1)]
# Keep patches based on pixel counts
maxCount = 200 # removes large background patches
minCount = 40 # removes specs and centres of numbering
keep = [c<maxCount and c>minCount for c in counts]
output = markers.copy()
for i,k in enumerate(keep):
output[markers==i] = k
Here is what the images look like at each stage.