I'm currently working on my first assignment in image processing (using OpenCV in Python). My assignment is to calculate a precise score (to tenths of a point) of one to several shooting holes in an image uploaded by a user. One of the requirements is to transform the uploaded shooting target image to be from "birds-eye view" for further processing. For that I have decided that I need to find center coordinates of numbers (7 & 8) to select them as my 4 quadrilateral.
Unfortunately, there are several limitations that need to be taken into account.
Limitations:
resolution of the processed shooting target image can vary
the image can be taken in different lighting conditions
the image processed by this part of my algorithm will always be taken under an angle (extreme angles will be automatically rejected)
the image can be slightly rotated (+/- 10 degrees)
the shooting target can be just a part of the image
the image can be only of the center black part of the target, meaning the user doesn't have to take a photo of the whole shooting target (but there always has to be the center black part on it)
this algorithm can take a maximum of 2000ms runtime
What I have tried so far:
Template matching
here I quickly realized that it was unusable since the numbers could be slightly rotated and a different scale
Feature matching
I have tried all of the different feature matching types (SIFT, SURF, ORB...)
unfortunately, the numbers do not have that specific set of features so they matched a quite lot of false positives, but I could possibly filter them by adding shape matching, etc..
the biggest blocker was runtime, the runtime of only a single number feature matching took around 5000ms (even after optimizations) (on MacBook PRO 2017)
Optical character recognition
I mostly tried using pytesseract library
even after thresholding the image to inverted binary (so the text of numbers 7 and 8 is black and the background white) it failed to recognize them
I also tried several ways of preprocessing the image and I played a lot with the tesseract config parameter but it didn't seem to help whatsoever
Contour detection
I have easily detected all of the wanted numbers (7 & 8) as single contours but failed to filter out all of the false positives (since the image can be in different resolutions and also there are two types of targets with different sizes of the numbers I couldn't simply threshold the contour by its width, height or area)
After I would detect the numbers as contours I wanted to extract them as some ROI and then I would use OCR on them (but since there were so many false positives this would take a lot of time)
I also tried filtering them by using cv2.matchShapes function on both contours and cropped template / ROI but it seemed really unreliable
Example processed images:
high resolution version here
high resolution version here
high resolution version here
high resolution version here
high resolution version here
high resolution version here
As of right now, I'm lost on how to progress about this. I have tried everything I could think of. I would be immensely happy if any of you image recognition experts gave me any kind of advice or even better a usable code example to help me solve my problem.
Thank you all in advance.
Find the black disk by adaptive binarization and contour (possibly blur to erase the inner features);
Fit an ellipse to the outline, as accurate as possible;
Find at least one edge of the square (Hough lines);
Classify the edge as one of NWSE (according to angle);
Use the ellipse and the line information to reconstruct the perspective transformation (it is an homography);
Apply the inverse homography to straighten the image and obtain the exact target center and axis;
Again by adaptive binarization, find the bullet holes (center/radius);
Rate the holes after their distance to the center, relative to the back disk radius.
If the marking scheme is variable, detect the circles (Hough circles, using the known center, or detect peaks in an oblique profile starting from the center).
If necessary, you could OCR the digits, but it seems that the score is implicitly starting at one in the outer ring.
Related
I have the following JPG image. If I want to find the edges where the white page meets the black background. So I can rotate the contents a few degrees clockwise. My aim is to straighten the text for using with Tesseract OCR conversion. I don't see the need to rotate the text blocks as I have seen in similar examples.
In the docs Canny Edge Detection the third arg 200 eg edges = cv.Canny(img,100,200) is maxVal and said to be 'sure to be edges'. Is there anyway to determine these (max/min) values ahead of any trial & error approach?
I have used code examples which utilize the Python cv2 module. But the edge detection is set up for simpler applications.
Is there any approach I can use to take the text out of the equation. For example: only detecting edge lines greater than a specified length?
Any suggestions would be appreciated.
Below is an example of edge detection (above image same min/max values) The outer edge of the page is clearly defined. The image is high contrast b/w. It has even lighting. I can't see a need for the use of an adaptive threshold. Simple global is working. Its just at what ratio to use it.
I don't have the answer to this yet. But to add. I now have the contours of the above doc.
I used find contours tutorial with some customization of the file loading. Note: removing words gives a thinner/cleaner outline.
Consider Otsu.
Its chief virtue is that it is adaptive to local
illumination within the image.
In your case, blank margins might be the saving grace.
Consider working on a series of 2x reduced resolution images,
where new pixel is min() (or even max()!) of original four pixels.
These reduced images might help you to focus on the features
that matter for your use case.
The usual way to deskew scanned text is to binarize and
then keep changing theta until "sum of pixels across raster"
is zero, or small. In particular, with few descenders
and decent inter-line spacing, we will see "lots" of pixels
on each line of text and "near zero" between text lines,
when theta matches the original printing orientation.
Which lets us recover (1.) pixels per line, and (2.) inter-line spacing, assuming we've found a near-optimal theta.
In your particular case, focusing on the ... leader dots
seems a promising approach to finding the globally optimal
deskew correction angle. Discarding large rectangles of
pixels in the left and right regions of the image could
actually reduce noise and enhance the accuracy of
such an approach.
I get in trouble by finding an algorithm to remove the convexity of my photos. As you can see the photos are captured from book pages, and I wanna remove the convexity. My question is similar to this but what I have is just page boundaries as input and neither I have grid nor am able to find by processing algorithms.
I wanna output as the right one in the below photo.
Obviously, the perspective transformation is the first thing comes in mind. However, as you can see the result is not promising:
Here's a possible pipeline to solve your problem. The main idea is to identify the text, create a super blob of it with some morphology, locate the 4 corners of this super blob and feed the points to a perspective "unwarper" (or rectifier, or whatever you wish to call that perspective correction method).
Start by converting your image to grayscale and apply adaptive thresholding to it. Try the Gaussian or Mean methods with parameters that better fit your tests. This is the result I obtain after fiddling with the values for a bit:
Now, the idea is to isolate just the text. The solution I applied is: obtain the biggest blobs and subtract them from the original image. You're going to need a method to calculate the area of each binary blob. Check this previous post for suggestions on how to implement one.
These are the biggest blobs from the image:
Subtract the largest blobs from the original image. This is the result:
As you can see, the text is almost isolated. Let me clean up the little bits of pixels by applying, again, an area filter. This time to eliminate the small blobs. This is the result:
Very good, some characters are lost during the operation, but that’s ok. We need a nice continuous block of text, because we are gonna dilate the hell of it. I tried applying a rectangular structuring element of size 5 and 5 Op iterations. Erode the output with 5 more iterations afterward, so you end up with this nice - isolated - super blob were the text used to be:
Check it out. The 3 markers you see are the centroids of the biggest blobs that I detected on the image. We need to find the 4 corners of the super blob. The biggest blob in the image is what we are after. I decided to re-use the area filter and look for the blob with the biggest area. This is the isolated super blob:
From here, the operations are pretty straightforward. Again, the goal is to get the four corners of this blob. You can fit a rectangle or apply an edge detector followed by Hough transform, to get the straight lines that follow the edges of the super blob.
I decided to apply a Canny Edge detector followed by Hough transform. Of course, I tuned the transform to filter only the possible lines I’m interested in – straight lines above a certain length. This is the result of the line detection:
There's some extra info plotted on the image. The markers you see (red and yellow) are the start/endpoints of the lines. My idea here was to find a bunch of these lines and compute the mean of these points. The idea is that we have a cluster of points that are separated in "quadrants". If we compute the mean of the start and endpoints of each line per quadrant, we will end up with 4 means – and these are the approximate values of the super blob’s corners!
I applied K-means to the start and endpoints of the lines, but you very well prefer other methods of processing. That's ok. My approximate corners are identified by the big red O markers in the above image.
As I suggested, try giving a fixed output position for these corners. I defined the red rectangle for the corners to be mapped on. For this test, I pretty much adjusted the rectangle manually. The perspective correction yields this result:
Some suggestions:
Depending on the resolution of the input image, you could downsize it
for a faster and better result, as your input seems big enough for
that.
Tune Hough Line Detection to yield larger lines. My current
configuration detects some smaller lines and that can hinder the
corner approximation.
I choose a somewhat robust method for calculating the 4 corners of
the super blob that I’ve personally used before (Edge detection +
Hough Line Transform + K-means) but whatever processing chain you
chose to obtain the data is entirely up to you!
When humans see markers suggesting the form of a shape, they immediately perceive the shape itself, as in https://en.wikipedia.org/wiki/Illusory_contours. I'm trying to accomplish something similar in OpenCV in order to detect the shape of a hand in a depth image with very heavy noise. In this question, assume that skin color based detection is not working (actually it is the best I've achieved so far but it is not robust under changing light conditions, shadows or skin colors. Also various paper shapes (flat and colorful) are on the table, confusing color-based approaches. This is why I'm attempting to use the depth cam instead).
Here's a sample image of the live footage that is already pre-processed for better contrast and with background gradient removed:
I want to isolate the exact shape of the hand from the rest of the picture. For a human eye this is a trivial thing to do. So here are a few attempts I did:
Here's the result with canny edge detection applied. The problem here is that the black shape inside the hand is larger than the actual hand, causing the detected hand to overshoot in size. Also, the lines are not connected and I fail at detecting contours.
Update: Combining Canny and a morphological closing (4x4 px ellipse) makes contour detection possible with the following result. It is still waaay too noisy.
Update 2: The result can be slightly enhanced by drawing that contour to an empty mask, save that in a buffer and re-detect yet another contour on a merge of three buffered images. The line that combines the buffered images is is hand_img = np.array(np.minimum(255, np.multiply.reduce(self.buf)), np.uint8) which is then morphed once again (closing) and finally contour detected. The results are slightly less horrible than in the picture above but laggy instead.
Alternatively I tried to use an existing CNN (https://github.com/victordibia/handtracking) for detecting the approximate position of the hand's center (this step works) and then flood from there. In order to detect contours the result is put into an OTSU filter and then the largest contour is taken, resulting in the following picture (ignore black rectangles in the left). The problem is that some of the noise is flooded as well and the results are mediocre:
Finally, I tried background removers such as MOG2 or GMG. They are confused by the enormous amount of fast-moving noise. Also they cut off the fingertips (which are crucial for this project). Finally, they don't see enough details in the hand (8 bit plus further color reduction via equalizeHist yield a very poor grayscale resolution) to reliably detect small movements.
It's ridiculous how simple it is for a human to see the exact precise shape of the hand in the first picture and how incredibly hard it is for the computer to draw a shape.
What would be your recommended method to achieve an exact hand segmentation?
After two days of desperate testing, the solution was to VERY carefully apply thresholding to an well-preprocessed image.
Here are the steps:
Remove as much noise as you possibly can. In my case, denoising was done using Intel's pyrealsense2 (I'm using an Intel RealSense depth camera and the algorithms were written for that camera family, thus they work very well). I used rs.temporal_filter() and directly after rs.hole_filling_filter() on every frame.
Capture the very first frame. Besides capturing the exact distance to the table (for later thresholding), this step also saves a still picture that is blurred by a 100x100 px kernel. Since the camera is never mounted perfectly but slightly tilted, there's an ugly grayscale gradient going over the picture and making operations impossible. This still picture is then subtracted from every single later frame, eliminating the gradient. BTW: this gradient removal step is already incorporated in the screenshots shown in the question above
Now the picture is almost noise-free. Do not use equalizeHist. This does not simply increase the general contrast regularly but instead empathizes the remaining noise way too much. This was my main error I did in almost all experiments. Instead, apply a threshold (binary with fixed border) directly. The border is extremely thin, setting it at 104 instead of 205 makes a huge difference.
Invert colors (unless you have taken BINARY_INV in the previous step), apply contours, take the largest one and write it to a mask
Voilà!
I am trying to extract the tiles ( Letters ) placed on a Scrabble Board. The goal is to identify / read all possible words present on the board.
An example image -
Ideally, I would like to find the four corners of the scrabble Board, and apply perspective transform, for further processing.
After Perspective transform -
The algorithm that I am using is as follows -
Apply Adaptive thresholding to the gray scale image of the Scrabble Board.
Dilate / Close the image, find the largest contour in the given image, then find the convex hull, and completely fill the area enclosed by the convex hull.
Find the boundary points ( contour ) of the resultant image, then apply Contour approximation to get the corner points, then apply perspective transform
Corner Points found -
This approach works with images like these. But, as you can see, many square boards have a base, which is curved at the top and the bottom. Sometimes, the base is a big circular board. And with these images my approach fails. Example images and outputs -
Board with Circular base:
Points found using above approach:
I can post more such problematic images, but this image should give you an idea about the problem that I am dealing with. My question is -
How do I find the rectangular board when a circular board is also present in the image?
Some points I would like to state -
I tried using hough lines to detect the lines in the image, find the largest vertical line(s), and then find their intersections to detect the corner points. Unfortunately, because of the tiles, all lines seem to be distorted / disconnected, and hence my attempts have failed.
I have also tried to apply contour approximation to all the contours found in the image ( I was assuming that the large rectangle, too, would be a contour ), but that approach failed as well.
I have implemented the solution in openCV-python. Since the approach is what matters here, and the question was becoming a tad too long, I didn't post the relevant code.
I am willing to share more such problematic images as well, if it is required.
Thank you!
EDIT1
#Silencer's answer has been mighty helpful to me for identifying letters in the image, but I want to accurately find the placement of the words in the image. Hence, I feel identifying the rows and columns is necessary, and I can do that only when a perspective transform is applied to the board.
I wrote an answer on MSER text detection:
Trying to Plot OpenCV's MSER regions using matplotlib
The code generate the following results on your images.
You can have a try.
I think #silencer has already given quite promising solution.
But to perform perspective transform as you have mentioned that you have already tried with hough lines to find the largest rectangle but it fails because for tiles present.
Given you have large image data set may be more than 1000 images, you can also give a shot to Deep learning based approach where you can train a model with images as input and corresponding rectangle boundary points coordinate as outputs.
I am doing a dice value recognition hobby project that I want to run on a Raspberry Pi. For now, I am just learning OpenCV as that seems like the hardest thing for me. I have gotten this far, where I have dilated, eroded and canny filtered out the dice. This has given me a hierarchy of contours. The image shows the bounding rectangles for the parent contours:
My question is: how would I proceed to count the pips? Is it better to do some template matching for face values, or should I mathematically test if a pip is in a valid position within the bounding box?
There could be multiple ways to do it:
Use hole filling and then morphological operator to filter circles.
Simpler approach would be using white pixel density (% of white pixels). Five dot would have higher white pixel density.
Use image moments (mathematical property which represents shape and structure of image) to train the neural network for different kinds of dice faces.
Reference:
Morphology
http://blogs.mathworks.com/pick/2008/05/23/detecting-circles-in-an-image/
As Sivam Kalra Said, there are many valid approaches.
I would go with template matching, as it should be robust and relatively easy to implement.
using your green regions in the canny image, copy each found die face from the original grayscale image into a smaller search image. The search image should be slightly larger than a die face, and larger than your 6 pattern images.
optionally normalize the search image
use cvMatchTemplate with each of the 6 possible dice patterns (I recommend the CV_TM_SQDIFF_NORMED algorithm, but test which works best)
find and store the global minimum in the result image for each of the 6 matches
rotate the search image in ~2° steps from 0° to 90°, and repeat the template match for each step
the dice pattern with the lowest minimum over all steps is the correct one.
contour hierechy could be a good and very easy option, but you need a perpendicular vision.
so you can do it with contours but fitting circles with som threshold
(sorry about my apalling english)