Opencv - Extracting data from in-game images - python

I need some help with an OpenCV project I'm working on. I'm taking images from a computer game (in this case, Fortnite), and I would like to extract different elements from them, eg. timer value, quantities of materials, health and shield etc.
Currently I perform a series of image preprocessing functions until I get a binary image, followed by locating the contours in the image and then sending those contours to a machine learning algorithm (K-Nearest-Neighbours).
I am able to succeed in a lot of cases, but there are some images where I don't manage to find some of the contours, therefore I don't find all the data.
An important thing to note is that I use the same preprocessing pipeline for all images, because I'm looking for as robust of a solution that I can manage.
I would like to know what I can do to improve the performance of my program. -
Is KNN a good model for this sort of task, or are there other models that might give me better results?
Is there any way to recognise characters without locating contours?
How can I make my preprocessing pipeline as robust as possible, given the fact that there is a lot of variance in the background across all images?
My goal is to process the images as fast as possible, starting out with a minimum of at least 2 images per second.
Thanks in advance for any help or advice you can give me!
Here is an example image before preprocessing
Here is the image after preprocessing, in this example I cannot find the contour for the 4 on the right side.

Quite simply, enlarging the image might help, since it increases the dark border of the number.
I threw together some code that does that. The result could be improved, but my point here is to show that the 4 can now be detected as a contour. To increase efficiency I only selected contours within a certain size.
Also, since it is part of the HUD, that usually means that the location on screen is always the same. If so, you can get great performance increase by only selecting the area with values (described here) - as I have done manually.
Finally, since the numbers have a consistent shape, you could try matchShapes as an alternative to kNN to recognize the numbers. I don't know how they compare in performance though, so you'll have to try that out yourself.
Result:
Code:
import numpy as np
import cv2
# load image
img = cv2.imread("fn2.JPG")
# enlarge image
img = cv2.resize(img,None,fx=4, fy=4, interpolation = cv2.INTER_CUBIC)
# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# create mask using threshold
ret,mask = cv2.threshold(gray,200,255,cv2.THRESH_BINARY)
# find contours in mask
im, contours, hierarchy = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# draw contour on image
for cnt in contours:
if cv2.contourArea(cnt) < 3000 and cv2.contourArea(cnt) > 200:
cv2.drawContours(img, [cnt], 0, (255,0,0), 2)
#show images
cv2.imshow("Mask", mask)
cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Related

Preprocess bad scans (partially blurred, shadowed and slightly skewed) for OCR

I try to read documents from various sources with Python. Therefore I am using OpenCV and Tesseract. To optimize the Tesseract performance, I do some preprocessing, but sadly the documents also vary in quality a lot. My current issue are documents that are only partially blurry or shaded, due to bad scans.
I have no influence on the document quality and manual feature detection is not applicable, because the code should run over hundred thousands of documents in the end and even inside a document the quality can vary strongly.
To get rid of the shadow, I found a technique with delating and blurring the image and dividing the original with the dilated version.
h, w = img.shape
kernel = np.ones((7, 7), np.uint8)
dilation = cv2.dilate(img, kernel, iterations=1)
blurred_dilation = cv2.GaussianBlur(dilation, (13, 13), 0)
resized = cv2.resize(blurred_dilation, (w, h))
corrected = img / resized * 255
That works very well.
But I still got that blur and optically it got worse to read.
I'd like to do a binarisation next, but then would be nothing valuable left from the blurred parts.
I found an example of a deconvolution that works for motion blur, but I can only apply it to the whole image, which blurs the rest of the text, and I need to know the direction of the motion blur.
So, I hope to get some help on how to optimize this kind of image, so that tesseract can properly read it.
I know that there should be further optimizations besides sharpening the blurred text. Deskewing and getting rid of the fragments of another pages.
These I am not sure about the proper sequence how to perform these additional steps.
I can hardly find sources or tutorials for plain document optimization for OCR processes.
Often the procedures apply globally to the whole image or are for non OCR applications.
Reminds me of this article I read a few years ago: https://medium.com/illuin/cleaning-up-dirty-scanned-documents-with-deep-learning-2e8e6de6cfa6
Contrary to the title, it contains a variety of classic computer vision algorithms for your inspiration.
To remove shadow, I've personally had median filtering as described (removing a median-filtered background) work more effectively than what you show here.
To deskew, I've experimented with Hough transform and got good results.
Intuitively, if you should know the font type and size in advance, that should help as well.
import cv2
import numpy as np
import skimage.filters as filters
# read the image
img = cv2.imread("input/ocr.png", 0)
# blur
blurred_dilation = cv2.GaussianBlur(img, (91, 91), 0)
# divide gray by morphology image
division = cv2.divide(img, blurred_dilation, scale=255)
# sharpen using unsharp masking
sharp = filters.unsharp_mask(division, radius=11, amount=11, multichannel=False, preserve_range=False)
sharp = (255 * sharp).clip(0, 255).astype(np.uint8)
# threshold
thresh = cv2.threshold(sharp, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# save results
cv2.imwrite('receipt_division_sharp.png', sharp)
cv2.imwrite('receipt_division_thresh.png', thresh)
result,
result with threshold
method: unsharp_mask filter, Otsu's method (1979)
ref: OpenCV: Contour detection of shadowed image before OCR
(2020 stack overflow)
If I were You, I'll try GAN. Though raw data is blurred and shadowed, You need clear data for tesseract. So You need to generate clear character from blurred raw data.

morphological transformation opencv noob questiong

hope you all have good day today. so I'm here learning python,opencv on a raspberry pi and Im hoping that someone can explain what the code below do, I've read from https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html and it doesn't explain what iterations mean and how to choose the best one? and what's the use for object detection,
thank you.
for frame in cam.capture_continuous(raw, format="bgr", use_video_port=True):
frame = frame.array
hsv = cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv,colorLower,colorUpper)
mask = cv2.blur(mask,(3,3))
mask= cv2.dilate(mask,None,iterations=5)
mask= cv2.erode(mask,None,iterations=1)
mask= cv2.dilate(mask,None,iterations=3)
me,thresh = cv2.threshold(mask,127,255,cv2.THRESH_BINARY)
cnts = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)[-2]
center = None
It is quite an obvious concept, either way, as stated in the documentation:
iterations: number of times dilation is applied.
So iterations=5 means 5 consecutive times we're applying dilation.
Regarding the applications for this procedure, I recommend you to read the Eroding and Dilating Tutorial from OpenCV which explains quite clearly how these methods work, but the absolute basics is that we need to erode and dilate to remove noise from the image and also close "holes" that wouldn't make a solid contour.
The purpose of this transformations is to have a very solid image of the objects we're looking for, to later detect contours.
The values for the number of iterations and transformations we apply vary greatly depending on the video, camera features, resolution, amount of noise, etc. You should experiment and find the optimal for your specific situation, or go for a generic erode + dilate and hope for best.

Detecting overlay in video with python

I am working with frames from a video. The video is overlaid with several semi-transparent boxes and my goal is to find the coordinates of these boxes. These boxes are the only fixed points in the video - the camera is moving, color intensity changes, there is no fixed reference. The problem is that the boxes are semi-transparent, so they also change with the video, albeit not as much. It seems that neither background substraction nor tracking have the right tools for this problem.
Nevertheless, I've tried the background substractors that come with cv2 as well as some homebrewn methods using differences between frames and thresholding. Unfortunately, these don't work due to the box transparency.
For reference, here is what the mean difference between the first 50 frames looks like:
And here is what cv2 background subtractor KNN returns:
I've experimented with thresholds, number of frames taken into account, various contouring algorithms, blurring/sharpening/etc. I've also tried techniques from document layout analysis.
I wonder if maybe there is something I'm missing due to not knowing the right keyword. I don't expect anyone here to give me the perfect solution, but any pointers as to where to look/what approach to try, are appreciated. I'm not bound to cv2 either, anything that works in python will do.
If you take a sample of random frames as elements of an array, and calculate the FFT, all the semi-transparent boxes will have a very high signal, and the rest of the pixels would behave as noise, so noise remotion will filter away the semi-transparent boxes. You can add the result of your other methods as additional frames for the fft
You are trying to find something that does not changes on the entire video, so do not use consecutive frames, or if you are forced to use consecutive frames, shuffle them randomly.
To gain speed, you may only take only one color channel from each frame, and pick the color channel randomly. That way the colors becomes noise, and cancel each other.
If the FFT is too expensive, just averaging random frames should filter the noise.
Ok here is first step, you can make Canny from that image, from canny you can make countours:
import cv2
import random as rng
image = cv2.imread("c:\stackoverflow\interface.png")
edges = cv2.Canny(image, 100, 240)
contoursext, hierarchy = cv2.findContours(
edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#cv2.RETR_EXTERNAL would work better if the image would not be framed.
for i in range(len(contoursext)):
color = (rng.randint(0,256), rng.randint(0,256), rng.randint(0,256))
cv2.drawContours(image, contoursext, i, color, 1, cv2.LINE_8, hierarchy, 0)
# Show in a window
cv2.imshow("Canny", edges)
cv2.imshow("Contour", image)
cv2.waitKey(0)
Then you can test if the contour or combination of 2 contours is rectangles for example...wich would probably detect most of the rectangle overlays...
Or Also you can try to detect canny lines if they are similar to rectangles.

Detect rectanglular signature fields in document scans using OpenCV

I am trying to extract rectangular big boxes from document images with signatures in it. Since i don't have training data (for deep learning), i want to cut rectangular boxes (3 in all images) from these images using OpenCV.
Here is what I tried:
import numpy as np
import cv2
img = cv2.imread('S-0330-444-20012800.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,1)
contours,h = cv2.findContours(thresh,1,2)
for cnt in contours:
approx = cv2.approxPolyDP(cnt,0.02*cv2.arcLength(cnt,True),True)
if len(approx)==4:
cv2.drawContours(img,[cnt],0,(26,60,232),-1)
cv2.imshow('img',img)
cv2.waitKey(0)
sample image
With the above code, I get a lot of squares (around 152 small points like squares) and of course not the 3 boxes.
Replies appreciated. [sample image is attached]
I would suggest you read up on template matching. There is also a good OpenCV tutorial on this.
For your use case, the idea would be to generate a stereotyped image of a rectangular box with the same shape (width/height ratio) as the boxes found on your documents. Depending on whether your input images show the document always in the same scaling or not, your would need to either resize the inputs to keep their magnification constant, or you would need to operate with a template bank (e.g. an array of box templates in various scalings).
Briefly, you would then cross-correlate the template box(es) with the input image and (in case of well-matched scaling) would find ideally relatively sharp peaks indicating the centers of your document boxes.
In the code above, use image pyramids (to merge unwanted contour noises) and cv2.findContours in combination. Post to that filtering list of Contours based on contour area cv2.contourArea will lead to only bigger squares.
There is also an alternate solution. Looking at images, we can see that the signature text usually is bigger than that of printed text in that ROI. so we can filter out contours smaller than signature contours and extract only the signature.
Its always good to remove noise before using cv2.findContours e.g. dilate, erode, blurring etc.

Irregular shape detection and measurement in python opencv

I'm attempting to do some image analysis using OpenCV in python, but I think the images themselves are going to be quite tricky, and I've never done anything like this before so I want to sound out my logic and maybe get some ideas/practical code to achieve what I want to do, before I invest a lot of time going down the wrong path.
This thread comes pretty close to what I want to achieve, and in my opinion, uses an image that should be even harder to analyse than mine. I'd be interested in the size of those coloured blobs though, rather than their distance from the top left. I've also been following this code, though I'm not especially interested in a reference object (the dimensions in pixels alone would be enough for now and can be converted afterwards).
Here's the input image:
What you're looking at are ice crystals, and I want to find the average size of each. The boundaries of each are reasonably well defined, so conceptually this is my approach, and would like to hear any suggestions or comments if this is the wrong way to go:
Image in RGB is imported and converted to 8bit gray (32 would be better based on my testing in ImageJ, but I haven't figured out how to do that in OpenCV yet).
The edges are optionally Gaussian blurred to remove noise
A Canny edge detector picks up the lines
Morphological transforms (erosion + dilation) are done to attempt to close the boundaries a bit further.
At this point it seems like I have a choice to make. I could either binarise the image, and measure blobs above a threshold (i.e. max value pixels if the blobs are white), or continue with the edge detection by closing and filling contours more fully. Contours seems complicated though looking at that tutorial, and though I can get the code to run on my images, it doesn't detect the crystals properly (unsurprisingly). I'm also not sure if I should morph transform before binarizing too?
Assuming I can get all that to work, I'm thinking a reasonable measure would be the longest axis of the minimum enclosing box or ellipse.
I haven't quite ironed out all the thresholds yet, and consequently some of the crystals are missed, but since they're being averaged, this isn't presenting a massive problem at the moment.
The script stores the processed images as it goes along, so I'd also like the final output image similar to the 'labelled blobs' image in the linked SO thread, but with each blob annotated with its dimensions maybe.
Here's what an (incomplete) idealised output would look like, each crystal is identified, annotated and measured (pretty sure I can tackle the measurement when I get that far).
Abridged the images and previous code attempts as they are making the thread overly long and are no longer that relevant.
Edit III:
As per the comments, the watershed algorithm looks to be very close to achieving what I'm after. The problem here though is that it's very difficult to assign the marker regions that the algorithm requires (http://docs.opencv.org/3.2.0/d3/db4/tutorial_py_watershed.html).
I don't think this is something that can be solved with thresholds through the binarization process, as the apparent colour of the grains varies by much more than the toy example in that thread.
Edit IV
Here are a couple of the other test images I've played with. It fares much better than I expected with the smaller crystals, and theres obviously a lot of finessing that could be done with the thresholds that I havent tried yet.
Here's 1, top left to bottom right correspond to the images output in Alex's steps below.
And here's a second one with bigger crystals.
You'll notice these tend to be more homogeneous in colour, but with harder to discern edges. Something I found a little suprising is that the edge floodfilling is a little overzealous with some of the images, I would have thought this would be particularly the case for the image with the very tiny crystals, but actually it appears to have more of an effect on the larger ones. There is probably a lot of room to improve the quality of the input images from our actual microscopy, but the more 'slack' the programming can take from the system, the easier our lives will be!
As I mentioned in the comments, watershed looks to be an ok approach for this problem. But as you replied, defining the foreground and the background for the markers is the hard part! My idea was to use the morphological gradient to get good edges along the ice crystals and work from there; the morphological gradient seems to work great.
import numpy as np
import cv2
img = cv2.imread('image.png')
blur = cv2.GaussianBlur(img, (7, 7), 2)
h, w = img.shape[:2]
# Morphological gradient
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
gradient = cv2.morphologyEx(blur, cv2.MORPH_GRADIENT, kernel)
cv2.imshow('Morphological gradient', gradient)
cv2.waitKey()
From here, I binarized the gradient using some thresholding. There's probably a cleaner way to do this...but this happens to work better than the dozen other ideas I tried.
# Binarize gradient
lowerb = np.array([0, 0, 0])
upperb = np.array([15, 15, 15])
binary = cv2.inRange(gradient, lowerb, upperb)
cv2.imshow('Binarized gradient', binary)
cv2.waitKey()
Now we have a couple issues with this. It needs some cleaning up as it's messy, and further, the ice crystals that are on the edge of the image are showing up---but we don't know where those crystals actually end so we should actually ignore those. To remove those from the mask, I looped through the pixels on the edge and used floodFill() to remove them from the binary image. Don't get confused here on the orders of rows and columns; the if statements are specifying rows and columns of the image matrix, while the input to floodFill() expects points (i.e. x, y form, which is opposite from row, col).
# Flood fill from the edges to remove edge crystals
for row in range(h):
if binary[row, 0] == 255:
cv2.floodFill(binary, None, (0, row), 0)
if binary[row, w-1] == 255:
cv2.floodFill(binary, None, (w-1, row), 0)
for col in range(w):
if binary[0, col] == 255:
cv2.floodFill(binary, None, (col, 0), 0)
if binary[h-1, col] == 255:
cv2.floodFill(binary, None, (col, h-1), 0)
cv2.imshow('Filled binary gradient', binary)
cv2.waitKey()
Great! Now just to clean this up with some opening and closing...
# Cleaning up mask
foreground = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)
foreground = cv2.morphologyEx(foreground, cv2.MORPH_CLOSE, kernel)
cv2.imshow('Cleanup up crystal foreground mask', foreground)
cv2.waitKey()
So this image was labeled as "foreground" because it has the sure foreground of the objects we want to segment. Now we need to create a sure background of the objects. Now, I did this in the naïve way, which just is to grow your foreground a bunch, so that your objects are probably all defined within that foreground. However, you could probably use the original mask or even the gradient in a different way to get a better definition. Still, this works OK, but is not very robust.
# Creating background and unknown mask for labeling
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (17, 17))
background = cv2.dilate(foreground, kernel, iterations=3)
unknown = cv2.subtract(background, foreground)
cv2.imshow('Background', background)
cv2.waitKey()
So all the black there is "sure background" for the watershed. Also I created the unknown matrix, which is the area between foreground and background, so that we can pre-label the markers that get passed to watershed as "hey, these pixels are definitely in the foreground, these others are definitely background, and I'm not sure about these ones between." Now all that's left to do is run the watershed! First, you label the foreground image with connected components, identify the unknown and background portions, and pass them in:
# Watershed
markers = cv2.connectedComponents(foreground)[1]
markers += 1 # Add one to all labels so that background is 1, not 0
markers[unknown==255] = 0 # mark the region of unknown with zero
markers = cv2.watershed(img, markers)
You'll notice that I ran watershed() on img. You might experiment running it on a blurred version of the image (maybe median blurring---I tried this and got a little smoother of boundaries for the crystals) or other preprocessed versions of the images which define better boundaries or something.
It takes a little work to visualize the markers as they're all small numbers in a uint8 image. So what I did was assign them some hue in 0 to 179 and set inside a HSV image, then convert to BGR to display the markers:
# Assign the markers a hue between 0 and 179
hue_markers = np.uint8(179*np.float32(markers)/np.max(markers))
blank_channel = 255*np.ones((h, w), dtype=np.uint8)
marker_img = cv2.merge([hue_markers, blank_channel, blank_channel])
marker_img = cv2.cvtColor(marker_img, cv2.COLOR_HSV2BGR)
cv2.imshow('Colored markers', marker_img)
cv2.waitKey()
And finally, overlay the markers onto the original image to check how they look.
# Label the original image with the watershed markers
labeled_img = img.copy()
labeled_img[markers>1] = marker_img[markers>1] # 1 is background color
labeled_img = cv2.addWeighted(img, 0.5, labeled_img, 0.5, 0)
cv2.imshow('watershed_result.png', labeled_img)
cv2.waitKey()
Well, that's the pipeline in it's entirety. You should be able to copy/paste each section in a row and you should be able to get the same results. The weakest parts of this pipeline is binarizing the gradient and defining the sure background for watershed. The distance transform might be useful in binarizing the gradient somehow, but I haven't gotten there yet. Either way...this was a cool problem, I would be interested to see any changes you make to this pipeline or how it fares on other ice-crystal images.

Categories