I am trying to do characters detection, have to draw a box around them, then crop and then feed to a neural network for recognition. Everything is working but before I was using sets of characters on a single color background image and segmentation was easily done.
However with real photos I have different lighting conditions and really struggle to find the contours.
After applying some adaptive thresholding I managed to get the folowing results, but starting from that I really can't figure how to properly proceed and detect each character. I can detect half of the characters easily, but not all of them. Probably because they are surrounded by lots of small irrelevant contours.
I have a feeling there is one step left but I can't figure which one.
Find Countours is capable of finding only about half of the characters.
For now, in short, im doing:
im_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
im_gray = cv2.GaussianBlur(im_gray, (5, 5), 0)
_, th1 = cv2.threshold(im_gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
cim, ctrs, hier = cv2.findContours(th1.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
and
th2 = cv2.adaptiveThreshold(im_gray,255,cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY,11,2)
Images below - original image and some variations of intermediate results.
Original picture:
After some thresholding:
After some thresholding:
Inverse thresholding:
So the question is - what is the step/steps after to segment the characters?
You can perform difference of gaussians. The idea is to blur the image with two different kernels and subtract their respective results:
Code:
im = cv2.imread(img, 0)
#--- it is better to take bigger kernel sizes to remove smaller edges ---
kernel1 = 15
kernel2 = 31
blur1 = cv2.GaussianBlur(im,(kernel1, kernel1), 0)
blur2 = cv2.GaussianBlur(im,(kernel2, kernel2), 0)
cv2.imshow('Difference of Gaussians',blur2 - blur1)
Result:
Related
I'm writing a program that takes an image that has a grid of 4 by 4 letters somewhere in that image.
E.g.
1
I want to read these letters into my program and for that I'm using pytesseract for the OCR.
Before feeding the image to pytesseract I do some preprocessing with openCV to increase the odds of pytesseract working correctly.
This is the code I use for this:
import cv2
img = cv2.imread('my_image.png')
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_pre_processed = cv2.threshold(img_gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
And these are some sample outputs of img_pre_processed:
2 3 4
Since the letters in the grid are spaced apart pytesseract has a difficult time to read them when I give the entire image as input. So it would be helpful if I knew the coordinates of every letter, then I could edit the image in such a way that pytesseract can always recognise them.
I started to try and solve this problem on my own and the solution I'm coming up with might work but it's getting rather complicated. So I'm wondering if there is a better way to do it.
At the moment I'm using the cv2.findContours() function to get all the contours of the objects in the image. For every contour I calculate the center coordinates and the area of the box you would be able to draw around it. I then sort these by area to get the largest contours. Now here it starts to get more and more complicated. I can't just say take the biggest 16 contours, because there might be unwanted objects in the picture that have a bigger area than the 16 letters that I want. Also some letters like O, P, Q,... have 2 contours and their inner contour might even be bigger than another letters outer contour like the letter I for example.
E.g. This is an image with the 18 biggest contours marked with a green box. 5
So to continue with my way of attacking the problem I would have to write an algorithm that finds the contours that are most likely part of the grid while ignoring the contours that are unwanted and also the inner contours of letters that have 2 contours.
While this is possible, I'm wondering if there is be a better way of doing this.
Somebody told me that if you can filter the image in such a way that everything gets more blurry so that all the letters become blobs. That it might be possible to do a pattern detection with 4x4 grid of blobs. But I don't know how to do that or if that's possible.
So if somebody knows a better way to tackle this problem or if you know how to execute the plan of attack I mentioned earlier that would be most helpfull.
Thanks in advance!
You can simply filter the bounding rectangles by width and height. As this is a rule based approach, it may need more example images to fine tune the filter rules.
import cv2
# get bounding rectangles of contours
img = cv2.imread('img.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
bbox = [cv2.boundingRect(c) for c in contours]
# filter rectangles by width and height
for x, y, w, h in bbox:
if (4 < w < 200) and (30 < h < 200):
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:
This question already has answers here:
Remove spurious small islands of noise in an image - Python OpenCV
(2 answers)
Closed 2 years ago.
I have a code for finding the contours in image with OpenCV. But my code doesn't work when it's based on a messy image.
My image:
My image is a scanned paper, there is a lot of noise and messy areas. So I applied Gaussian Blur, OTSU-Thresholding and Morph close for fix.
My code:
# Apply GaussianBlur + OTSU-Thresholding
grayscale_image = cv.cvtColor(source_image, cv.COLOR_BGR2GRAY)
grayscale_image = cv.GaussianBlur(grayscale_image, (5, 5), 0)
ret, grayscale_image = cv.threshold(grayscale_image, 200, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)
cv.imshow("grayscale_image", grayscale_image)
# Apply Morph Close
kernel = cv.getStructuringElement(cv.MORPH_RECT, (5, 5))
morph_closed_image = cv.morphologyEx(grayscale_image, cv.MORPH_CLOSE, kernel)
cv.imshow("morph_closed_image", morph_closed_image)
Nevertheless, the contours was created strangely by my code. messy areas of image was recognized as the contours.
My image with contours:
and, this is my contours code:
contours, hierarchy = cv.findContours(morph_closed_image, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
contour_sizes = [(cv.contourArea(contour), contour) for contour in contours]
biggest_contour = max(contour_sizes, key=lambda x: x[0])[1]
contour_image = source_image.copy()
cv.drawContours(contour_image, [biggest_contour], 0, (0, 0, 255), 2)
cv.imshow('contour_image', contour_image)
Therefore, I want to fix the noise and the messy in my image. If I fix noise and messy areas from the image, applied contours is work well.
How can I fix messy areas in my image with openCV?
My goal:
Please give me some advice.
M Z have a good point.
You just need to erode and then dilate it. It actually could be with the same kernel and iterations.
The main purpose for this is:
With the erode, guaraty that little white shape are killed.
With the dilation, recover the white shape eroded in the region of interest (the big shape).
So, you should erode until all the little shapes are killed, then try to return the original size of the big shape with dilation.
I'm working in a script using different OpenCV operations for processing an image with solar panels in a house roof. My original image is the following:
After processing the image, I get the edges of the panels as follows:
It can be seen how some rectangles are broken due to reflection of the Sun in the picture.
I would like to know if it's possible to fix those broken rectangles, maybe by using the pattern of those which are not broken.
My code is the following:
# Load image
color_image = cv2.imread("google6.jpg")
cv2.imshow("Original", color_image)
# Convert to gray
img = cv2.cvtColor(color_image, cv2.COLOR_BGR2GRAY)
# Apply various filters
img = cv2.GaussianBlur(img, (5, 5), 0)
img = cv2.medianBlur(img, 5)
img = img & 0x88 # 0x88
img = cv2.fastNlMeansDenoising(img, h=10)
# Invert to binary
ret, thresh = cv2.threshold(img, 127, 255, 1)
# Perform morphological erosion
kernel = np.ones((5, 5),np.uint8)
erosion = cv2.morphologyEx(thresh, cv2.MORPH_ERODE, kernel, iterations=2)
# Invert image and blur it
ret, thresh1 = cv2.threshold(erosion, 127, 255, 1)
blur = cv2.blur(thresh1, (10, 10))
# Perform another threshold on blurred image to get the central portion of the edge
ret, thresh2 = cv2.threshold(blur, 145, 255, 0)
# Perform morphological erosion to thin the edge by ellipse structuring element
kernel1 = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,5))
contour = cv2.morphologyEx(thresh2, cv2.MORPH_ERODE, kernel1, iterations=2)
# Get edges
final = cv2.Canny(contour, 249, 250)
cv2.imshow("final", final)
I have tried to modify all the filters I'm using in order to reduce as much as possible the effect of the Sun in the original picture, but that is as far as I have been able to go.
I'm in general happy with the result of all those filters (although any advice is welcome), so I'd like to work on the black/white imaged I showed, which is already smooth enough for the post-processing I need to do.
Thansk!
The pattern is not broken in the original image, so it being broken in your binarized result must mean your binarization is not optimal.
You apply threshold() to binarize the image, and then Canny() to the binary image. The problems here are:
Thresholding removes a lot of information, this should always be the last step of any processing pipeline. Anything you lose here, you've lost for good.
Canny() should be applied to a gray-scale image, not a binary image.
The Canny edge detector is an edge detector, but you want to detect lines, not edges. See here for the difference.
So, I suggest starting from scratch.
The Laplacian of Gaussian is a very simple line detector. I took these steps:
Read in image, convert to grayscale.
Apply Laplacian of Gaussian with sigma = 2.
Invert (negate) the result and then set negative values to 0.
This is the output:
From here, it should be relatively straight-forward to identify the grid pattern.
I don't post code because I used MATLAB for this, but you can accomplish the same result in Python with OpenCV, here is a demo for applying the Laplacian of Gaussian in OpenCV.
This is Python + OpenCV code to replicate the above:
import cv2
color_image = cv2.imread("/Users/cris/Downloads/L3RVh.jpg")
img = cv2.cvtColor(color_image, cv2.COLOR_BGR2GRAY)
out = cv2.GaussianBlur(img, (0, 0), 2) # Note! Specify size of Gaussian by the sigma, not the kernel size
out = cv2.Laplacian(out, cv2.CV_32F)
_, out = cv2.threshold(-out, 0, 1e9, cv2.THRESH_TOZERO)
However, it looks like OpenCV doesn't linearize (apply gamma correction) when converting from BGR to gray, as the conversion function does that I used when creating the image above. I think this gamma correction might have improved the results a bit by reducing the response to the roof tiles.
I want to detect how many number of cards are present in this image using python.I was trying with white pixel but not getting the correct result.
My code is given below:
import cv2
import numpy as np
img = cv2.imread('imaagi.jpg', cv2.IMREAD_GRAYSCALE)
n_white_pix = np.sum(img == 255)
print('Number of white pixels:', n_white_pix)
I am a beginner. So unable to find out the way.
This solution is with respect to the image you have provided and the implementation is in OpenCV.
Code:
im = cv2.imread('C:/Users/Jackson/Desktop/cards.jpg', 1)
#--- convert the image to HSV color space ---
hsv = cv2.cvtColor(im, cv2.COLOR_BGR2HSV)
cv2.imshow('H', hsv[:,:,0])
cv2.imshow('S', hsv[:,:,1])
#--- find Otsu threshold on hue and saturation channel ---
ret, thresh_H = cv2.threshold(hsv[:,:,0], 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
ret, thresh_S = cv2.threshold(hsv[:,:,1], 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
#--- add the result of the above two ---
cv2.imshow('thresh', thresh_H + thresh_S)
#--- some morphology operation to clear unwanted spots ---
kernel = np.ones((5, 5), np.uint8)
dilation = cv2.dilate(thresh_H + thresh_S, kernel, iterations = 1)
cv2.imshow('dilation', dilation)
#--- find contours on the result above ---
(_, contours, hierarchy) = cv2.findContours(dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
#--- since there were few small contours found, retain those above a certain area ---
im2 = im.copy()
count = 0
for c in contours:
if cv2.contourArea(c) > 500:
count+=1
cv2.drawContours(im2, [c], -1, (0, 255, 0), 2)
cv2.imshow('cards_output', im2)
print('There are {} cards'.format(count))
Result:
On the terminal I got: There are 6 cards
Depending on how exactly your "white pixel approach" was working (please share more details on that if possible), you could try a simple image binarization, which is a well-established way of separating different objects/entities in your image. Granted, it will work only on grayscale images, but that is something you can also easily fix with sklearn.
It might provide optimal results right away, especially if the lighting conditions vary across images, or you have (as seen above) cards that contain a wide variety of colors.
To circumvent this, you could also try to look into different color spaces, e..g HSV.
If that still does not work, I would recommend using image segmentation libraries from OpenCV or similra libraries. The problem is that they usually also bring some unwanted complexity to your project, which might not be necessary if it works with a simple approach such as the binarization.
I'm new to image processing. I want to perform character segmentation in OCR. I have already done necessary pre-processing. When I perform character segmentation by finding contouring it works well except for the character 3, 8.
After pre processed image looks like this,
Output after finding contouring for 3 and 8 is
Code used:
imgGray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
ret, imgThresh = cv2.threshold(imgGray, 127, 255, 0)
image, contours , _ = cv2.findContours(imgThresh, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
But it gives good result for other characters:
How to solve this issue?
Since your image does not involve any complex background I used Otsu's method to pick the right threshold for me:
threshold, thresh_img = cv2.threshold(imgGray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
Now when you find contours it will be easier. That is because contours are found for objects or charaters that are in white.
Have a look at those characters that caused problems now: