I'm working with text detection in Tensorflow, using EAST: An Efficient and Accurate Scene Text Detector.
And I'm struggling to find any decent/library that helps grouping close and overlapping bounding boxes like in the example:
I think it's a simple problem, there must be some OpenCV/Tensorflow function for this that I don't know yet.
Any ideas?
Related
I am trying to get data from GTA to train my object detection algorithm. However, the ground truth boxes I get are a little off from the exact boundaries. I am trying to get exact coordinates of the bounding box that would be around the gun, but you can see that the game box (in red) is not exact. As I am looking to collect a lot of data, I cannot manually correct them. One approach that seemed promising is to possibly crop the red game box and apply canny edge detection on it, but that would not collect the edges of the object outside of the box. Any automated solutions?
I would like to get the coordinates of framed text on an image. The paragraphs have thin black borders. The rest of the image contains usual paragraphs and sketchs.
Here is an example:
Do you have any idea of what kind of algorithms should I use in Python with an image library to achieve this ? Thanks.
A few ideas to detect a framed text which largely comes down to searching boxes/rectangles of substantial size:
find contours with OpenCV, analyze shapes using cv2.approxPolyDP() polygon approximation algorithm (also known as Ramer–Douglas–Peucker algorithm). You could additionally check the aspect ratio of the bounding box to make sure the shape is a rectangle as well as check the page width as this seems to be a known metric in your case. PyImageSearch did this amazing article:
OpenCV shape detection
in a related question, there is also a suggestion to look into Hough Lines to detect a horizontal line, taking a turn a detecting vertical lines the same way. Not 100% sure how reliable this approach would be.
Once you find the box frames, the next step would be to check if there is any text inside them. Detecting text is a broader problem in general and there are many ways of doing it, here are a few examples:
apply EAST text detector
PixelLink
tesseract (e.g. via pytesseract) but not sure if this would not have too many false positives
if it is a simpler case of boxes being empty or not, you could check for average pixel values inside - e.g. with cv2.countNonZero(). Examples:
How to identify empty rectangle using OpenCV
Count the black pixels using OpenCV
Additional references:
ideas on quadrangle/rectangle detection using convolutional neural networks
Here's what I'm trying to mimic: https://www.youtube.com/watch?v=exXD6wJLJ6s
This guy is separating the video input into many square grids and analyzing each region to know what's going on in that specific region.
For me, it seemed like he was finding the dominant color for each grid. So I tried getting dominant colors on an image with KMean method and it worked well. (I'm trying to 'divide and conquer' the problem, by addressing the problem from the smallest part.)
However, I have no idea how to get dominant color for each grid region of an image. I think I should iterate through each grid square but how?
Furthermore, it seems almost impossible for me to do the above task on a video. Can the same algorithm (detecting dominant color in an image (region)) also apply to real-time detection of a video? Wouldn't it be too sluggish?
I'm really new to OpenCV and I'm basically just following whatever tutorials that seem to be related to my project.
To sum up: I got dominant color from the image following the below tutorial and now I want to do this for each grid of an image/video.
https://www.pyimagesearch.com/2014/05/26/opencv-python-k-means-color-clustering/
This is what I've done so far:
I drew the grid on MSPaint:
I am working on a form extraction module which detects text in specific segments of an image. So far I am able to remove the text and retain only the bounding box in the image.
My next step was to extract each box in the image. To do that I am trying to detect corners in the image. But here is where I am stuck. I tried template matching. This was the result. Although the results look promising the drawback is that this method is very time consuming. And few corners are still not detected.
I also tried Shi-Tomasi Corner Detector after dilating the image.
What would be the best approach to solve this problem?
I suggest you detect the lines instead, e.g. using Hough transform, followed by edge chaining, followed by robust line fitting on each chain.
Hi I am wanting to use the python imaging library to crop images to a specific size for a website. I have a problem, these images are meant to show people's faces so I need to automatically crop based on them.
I know face detection is a difficult concept so I'm thinking of using the face.com API http://developers.face.com/tools/#faces/detect which is fine for what I want to do.
I'm just a little stuck on how I would use this data to crop a select area based on the majority of faces.
Can anybody help?
Joe
There is a library for python that have a concept of smart-cropping that among other options, can use face detection to do a smarter cropping.
It uses opencv under the hood, but you are isolated from it.
https://github.com/globocom/thumbor
If you have some rectangle that you want to excise from an image, here's what I might try first:
(optional) If the image is large, do a rough square crop centered on the face with dimensions sqrt(2) larger than the longer edge (if rectangular). Worst-case (45° rotation), it will still grab everything important.
Rotate based on the face orientation (something like rough_crop.rotate(math.degrees(math.atan(ydiff/xdiff)), trig is fun)
Do a final crop. If you did the initial crop, the face should be centered, otherwise you'll have to transform (rotate) all your old coordinates to the new image (more trig!).