Extracting data from graphs in a scanned document - python

EDIT: This is a deeper explanation of a question I asked earlier, which is still not solved for me.
I'm currently trying to write some code that can extract data from some uncommon graphs in a book. I scanned the pages of the book, and by using opencv I would like to detect some features ofthe graphs in order to convert them into useable data. In the left graph I'm looking for the height of the "triangles" and in the right graph the distance from the center to the points where the dotted lines intersect with the gray area. In both cases I would like to convert these values into numeric data for further usage.
For the left graph, I thought of detecting all the individual colors and computing the area of each sector by counting the amount of pixels in that color. When I have the area of these sectors, I can easily calculate their heights, using basic math. The following code snippet shows how far I've gotten already with identifying different colors. However I can't manage to make this work accurately. It always seems to detect some colors of other sectors as well, or not detect all pixels of one sector. I think it has something to do with the boundaries I'm using. I can't quite figure out how to make them work. Does someone know how I can determine these values?
import numpy as np
import cv2
img = cv2.imread('images/test2.jpg')
lower = np.array([0,0,100])
upper = np.array([50,56,150])
mask = cv2.inRange(img, lower, upper)
output = cv2.bitwise_and(img, img, mask = mask)
cv2.imshow('img', img)
cv2.imshow('mask', mask)
cv2.imshow('output', output)
cv2.waitKey(0)
cv2.destroyAllWindows()
For the right graph, I still have no idea how to extract data from it. I thought of identifying the center by detecting all the dotted lines, and then by detecting the intersections of these dotted lines with the gray area, I could measure the distance between the center and these intersections. However I couldn't yet figure out how to do this properly, since it sounds quite complex. The following code snippet shows how far I've gotten with the line detection. Also in this case the detection is far from accurate. Does someone have an idea how to tackle this problem?
import numpy as np
import cv2
# Reading the image
img = cv2.imread('test2.jpg')
# Convert the image to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Apply edge detection
edges = cv2.Canny(gray,50,150,apertureSize = 3)
# Line detection
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength=50,maxLineGap=20)
for line in lines:
x1,y1,x2,y2 = line[0]
cv2.line(img,(x1,y1),(x2,y2),(0,0,255),2)
cv2.imwrite('linesDetected.jpg',img)

For the left image, using your approach, try to look at the RGB histogram, the colors should be significant peaks, if you would like to use the relative area of the segments.
Another alternative could be to use Hough Circle Transform, which should work on circle segments. See also here.
For the right image ... let me think ...
You could create a "empty" diagram with no data inside. You know the locations of the circle segment ("cake pieces"). Then you could identify the area where the data is (the dark ones), either by using a grey threshold, an RGB threshold, or Find Contours or look for Watershed / Distance Transform.
In the end the idea is to make a boolean overlay between the cleared image and the segments (your data) that was found. Then you can identify which share of your circle segments is covered, or knowing the center, find the farthest point from the center.

Related

Python Tesseract figuring out text orientation/transformation

# Tesseract Win-Installer https://github.com/UB-Mannheim/tesseract/wiki
import pytesseract
import cv2
image = cv2.imread("img.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
text = pytesseract.image_to_string(image)
print(text)
the output is not even close "WX017," instead of "MX011A"
however, if I manually rearrange the characters it works. I could transform the input image and define an ROI but the orientation could be anything. It could be upside down as well.
I want to recognize curved text around a circle
1:
2:
3:
This is pretty difficult, as tesseract expects text to be minimally distorted.
One (admittedly far-fetched) possibility would be to try and detect the circle, then map it onto a rectangle.
To do this, you can reduce each letter to a non-connected blob using a blur filter and discarding grey values below a threshold; iterate until you get more or less circular blobs, then get their centers. Take several of those three by three at random, and for each triplet calculate the center of the circle encompassing all three. The average of those centers should be more or less the center of the lettering circle.
Having the center, and the approximate radius, it is relatively easy to map the circular crown of appropriate height to a rectangle (e.g. using polar-to-cartesian transform).
You then apply tesseract to the transformed rectangle.
It should also be possible to use autocorrelation to average and sharpen a multiple identical text along said rectangle (i.e. "MX011A MXOI1A MX017A" --> "MX011A").

Hough circles on edges

I have the following image in which I've detected the borders, representing 7 circles. In my opinion, it is fairly easy to identify the circles on it, but I am having trouble detecting all circles with the opencv Hough transform. Here it is what I've tried:
img = cv2.imread('sample.png',0)
edges = cv2.Canny(img,20,120)
circles = cv2.HoughCircles(edges,cv2.HOUGH_GRADIENT,1,100,
param2=40,minRadius=0,maxRadius=250)
I either get the central circle, the outer one or a lot of circles, depending on the parameters I input on the function. Do you guys have a set of parameters that would output all the circles?
Thanks in advance
Solved with this example from scikit-image adjusting canny thresholds to match the posted image and also the radii range.
Thanks to #barny

Detecting overlay in video with python

I am working with frames from a video. The video is overlaid with several semi-transparent boxes and my goal is to find the coordinates of these boxes. These boxes are the only fixed points in the video - the camera is moving, color intensity changes, there is no fixed reference. The problem is that the boxes are semi-transparent, so they also change with the video, albeit not as much. It seems that neither background substraction nor tracking have the right tools for this problem.
Nevertheless, I've tried the background substractors that come with cv2 as well as some homebrewn methods using differences between frames and thresholding. Unfortunately, these don't work due to the box transparency.
For reference, here is what the mean difference between the first 50 frames looks like:
And here is what cv2 background subtractor KNN returns:
I've experimented with thresholds, number of frames taken into account, various contouring algorithms, blurring/sharpening/etc. I've also tried techniques from document layout analysis.
I wonder if maybe there is something I'm missing due to not knowing the right keyword. I don't expect anyone here to give me the perfect solution, but any pointers as to where to look/what approach to try, are appreciated. I'm not bound to cv2 either, anything that works in python will do.
If you take a sample of random frames as elements of an array, and calculate the FFT, all the semi-transparent boxes will have a very high signal, and the rest of the pixels would behave as noise, so noise remotion will filter away the semi-transparent boxes. You can add the result of your other methods as additional frames for the fft
You are trying to find something that does not changes on the entire video, so do not use consecutive frames, or if you are forced to use consecutive frames, shuffle them randomly.
To gain speed, you may only take only one color channel from each frame, and pick the color channel randomly. That way the colors becomes noise, and cancel each other.
If the FFT is too expensive, just averaging random frames should filter the noise.
Ok here is first step, you can make Canny from that image, from canny you can make countours:
import cv2
import random as rng
image = cv2.imread("c:\stackoverflow\interface.png")
edges = cv2.Canny(image, 100, 240)
contoursext, hierarchy = cv2.findContours(
edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#cv2.RETR_EXTERNAL would work better if the image would not be framed.
for i in range(len(contoursext)):
color = (rng.randint(0,256), rng.randint(0,256), rng.randint(0,256))
cv2.drawContours(image, contoursext, i, color, 1, cv2.LINE_8, hierarchy, 0)
# Show in a window
cv2.imshow("Canny", edges)
cv2.imshow("Contour", image)
cv2.waitKey(0)
Then you can test if the contour or combination of 2 contours is rectangles for example...wich would probably detect most of the rectangle overlays...
Or Also you can try to detect canny lines if they are similar to rectangles.

Detect rectanglular signature fields in document scans using OpenCV

I am trying to extract rectangular big boxes from document images with signatures in it. Since i don't have training data (for deep learning), i want to cut rectangular boxes (3 in all images) from these images using OpenCV.
Here is what I tried:
import numpy as np
import cv2
img = cv2.imread('S-0330-444-20012800.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,1)
contours,h = cv2.findContours(thresh,1,2)
for cnt in contours:
approx = cv2.approxPolyDP(cnt,0.02*cv2.arcLength(cnt,True),True)
if len(approx)==4:
cv2.drawContours(img,[cnt],0,(26,60,232),-1)
cv2.imshow('img',img)
cv2.waitKey(0)
sample image
With the above code, I get a lot of squares (around 152 small points like squares) and of course not the 3 boxes.
Replies appreciated. [sample image is attached]
I would suggest you read up on template matching. There is also a good OpenCV tutorial on this.
For your use case, the idea would be to generate a stereotyped image of a rectangular box with the same shape (width/height ratio) as the boxes found on your documents. Depending on whether your input images show the document always in the same scaling or not, your would need to either resize the inputs to keep their magnification constant, or you would need to operate with a template bank (e.g. an array of box templates in various scalings).
Briefly, you would then cross-correlate the template box(es) with the input image and (in case of well-matched scaling) would find ideally relatively sharp peaks indicating the centers of your document boxes.
In the code above, use image pyramids (to merge unwanted contour noises) and cv2.findContours in combination. Post to that filtering list of Contours based on contour area cv2.contourArea will lead to only bigger squares.
There is also an alternate solution. Looking at images, we can see that the signature text usually is bigger than that of printed text in that ROI. so we can filter out contours smaller than signature contours and extract only the signature.
Its always good to remove noise before using cv2.findContours e.g. dilate, erode, blurring etc.

Irregular shape detection and measurement in python opencv

I'm attempting to do some image analysis using OpenCV in python, but I think the images themselves are going to be quite tricky, and I've never done anything like this before so I want to sound out my logic and maybe get some ideas/practical code to achieve what I want to do, before I invest a lot of time going down the wrong path.
This thread comes pretty close to what I want to achieve, and in my opinion, uses an image that should be even harder to analyse than mine. I'd be interested in the size of those coloured blobs though, rather than their distance from the top left. I've also been following this code, though I'm not especially interested in a reference object (the dimensions in pixels alone would be enough for now and can be converted afterwards).
Here's the input image:
What you're looking at are ice crystals, and I want to find the average size of each. The boundaries of each are reasonably well defined, so conceptually this is my approach, and would like to hear any suggestions or comments if this is the wrong way to go:
Image in RGB is imported and converted to 8bit gray (32 would be better based on my testing in ImageJ, but I haven't figured out how to do that in OpenCV yet).
The edges are optionally Gaussian blurred to remove noise
A Canny edge detector picks up the lines
Morphological transforms (erosion + dilation) are done to attempt to close the boundaries a bit further.
At this point it seems like I have a choice to make. I could either binarise the image, and measure blobs above a threshold (i.e. max value pixels if the blobs are white), or continue with the edge detection by closing and filling contours more fully. Contours seems complicated though looking at that tutorial, and though I can get the code to run on my images, it doesn't detect the crystals properly (unsurprisingly). I'm also not sure if I should morph transform before binarizing too?
Assuming I can get all that to work, I'm thinking a reasonable measure would be the longest axis of the minimum enclosing box or ellipse.
I haven't quite ironed out all the thresholds yet, and consequently some of the crystals are missed, but since they're being averaged, this isn't presenting a massive problem at the moment.
The script stores the processed images as it goes along, so I'd also like the final output image similar to the 'labelled blobs' image in the linked SO thread, but with each blob annotated with its dimensions maybe.
Here's what an (incomplete) idealised output would look like, each crystal is identified, annotated and measured (pretty sure I can tackle the measurement when I get that far).
Abridged the images and previous code attempts as they are making the thread overly long and are no longer that relevant.
Edit III:
As per the comments, the watershed algorithm looks to be very close to achieving what I'm after. The problem here though is that it's very difficult to assign the marker regions that the algorithm requires (http://docs.opencv.org/3.2.0/d3/db4/tutorial_py_watershed.html).
I don't think this is something that can be solved with thresholds through the binarization process, as the apparent colour of the grains varies by much more than the toy example in that thread.
Edit IV
Here are a couple of the other test images I've played with. It fares much better than I expected with the smaller crystals, and theres obviously a lot of finessing that could be done with the thresholds that I havent tried yet.
Here's 1, top left to bottom right correspond to the images output in Alex's steps below.
And here's a second one with bigger crystals.
You'll notice these tend to be more homogeneous in colour, but with harder to discern edges. Something I found a little suprising is that the edge floodfilling is a little overzealous with some of the images, I would have thought this would be particularly the case for the image with the very tiny crystals, but actually it appears to have more of an effect on the larger ones. There is probably a lot of room to improve the quality of the input images from our actual microscopy, but the more 'slack' the programming can take from the system, the easier our lives will be!
As I mentioned in the comments, watershed looks to be an ok approach for this problem. But as you replied, defining the foreground and the background for the markers is the hard part! My idea was to use the morphological gradient to get good edges along the ice crystals and work from there; the morphological gradient seems to work great.
import numpy as np
import cv2
img = cv2.imread('image.png')
blur = cv2.GaussianBlur(img, (7, 7), 2)
h, w = img.shape[:2]
# Morphological gradient
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
gradient = cv2.morphologyEx(blur, cv2.MORPH_GRADIENT, kernel)
cv2.imshow('Morphological gradient', gradient)
cv2.waitKey()
From here, I binarized the gradient using some thresholding. There's probably a cleaner way to do this...but this happens to work better than the dozen other ideas I tried.
# Binarize gradient
lowerb = np.array([0, 0, 0])
upperb = np.array([15, 15, 15])
binary = cv2.inRange(gradient, lowerb, upperb)
cv2.imshow('Binarized gradient', binary)
cv2.waitKey()
Now we have a couple issues with this. It needs some cleaning up as it's messy, and further, the ice crystals that are on the edge of the image are showing up---but we don't know where those crystals actually end so we should actually ignore those. To remove those from the mask, I looped through the pixels on the edge and used floodFill() to remove them from the binary image. Don't get confused here on the orders of rows and columns; the if statements are specifying rows and columns of the image matrix, while the input to floodFill() expects points (i.e. x, y form, which is opposite from row, col).
# Flood fill from the edges to remove edge crystals
for row in range(h):
if binary[row, 0] == 255:
cv2.floodFill(binary, None, (0, row), 0)
if binary[row, w-1] == 255:
cv2.floodFill(binary, None, (w-1, row), 0)
for col in range(w):
if binary[0, col] == 255:
cv2.floodFill(binary, None, (col, 0), 0)
if binary[h-1, col] == 255:
cv2.floodFill(binary, None, (col, h-1), 0)
cv2.imshow('Filled binary gradient', binary)
cv2.waitKey()
Great! Now just to clean this up with some opening and closing...
# Cleaning up mask
foreground = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)
foreground = cv2.morphologyEx(foreground, cv2.MORPH_CLOSE, kernel)
cv2.imshow('Cleanup up crystal foreground mask', foreground)
cv2.waitKey()
So this image was labeled as "foreground" because it has the sure foreground of the objects we want to segment. Now we need to create a sure background of the objects. Now, I did this in the naïve way, which just is to grow your foreground a bunch, so that your objects are probably all defined within that foreground. However, you could probably use the original mask or even the gradient in a different way to get a better definition. Still, this works OK, but is not very robust.
# Creating background and unknown mask for labeling
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (17, 17))
background = cv2.dilate(foreground, kernel, iterations=3)
unknown = cv2.subtract(background, foreground)
cv2.imshow('Background', background)
cv2.waitKey()
So all the black there is "sure background" for the watershed. Also I created the unknown matrix, which is the area between foreground and background, so that we can pre-label the markers that get passed to watershed as "hey, these pixels are definitely in the foreground, these others are definitely background, and I'm not sure about these ones between." Now all that's left to do is run the watershed! First, you label the foreground image with connected components, identify the unknown and background portions, and pass them in:
# Watershed
markers = cv2.connectedComponents(foreground)[1]
markers += 1 # Add one to all labels so that background is 1, not 0
markers[unknown==255] = 0 # mark the region of unknown with zero
markers = cv2.watershed(img, markers)
You'll notice that I ran watershed() on img. You might experiment running it on a blurred version of the image (maybe median blurring---I tried this and got a little smoother of boundaries for the crystals) or other preprocessed versions of the images which define better boundaries or something.
It takes a little work to visualize the markers as they're all small numbers in a uint8 image. So what I did was assign them some hue in 0 to 179 and set inside a HSV image, then convert to BGR to display the markers:
# Assign the markers a hue between 0 and 179
hue_markers = np.uint8(179*np.float32(markers)/np.max(markers))
blank_channel = 255*np.ones((h, w), dtype=np.uint8)
marker_img = cv2.merge([hue_markers, blank_channel, blank_channel])
marker_img = cv2.cvtColor(marker_img, cv2.COLOR_HSV2BGR)
cv2.imshow('Colored markers', marker_img)
cv2.waitKey()
And finally, overlay the markers onto the original image to check how they look.
# Label the original image with the watershed markers
labeled_img = img.copy()
labeled_img[markers>1] = marker_img[markers>1] # 1 is background color
labeled_img = cv2.addWeighted(img, 0.5, labeled_img, 0.5, 0)
cv2.imshow('watershed_result.png', labeled_img)
cv2.waitKey()
Well, that's the pipeline in it's entirety. You should be able to copy/paste each section in a row and you should be able to get the same results. The weakest parts of this pipeline is binarizing the gradient and defining the sure background for watershed. The distance transform might be useful in binarizing the gradient somehow, but I haven't gotten there yet. Either way...this was a cool problem, I would be interested to see any changes you make to this pipeline or how it fares on other ice-crystal images.

Categories