How to reliably extract UI information from video game stream?

How to reliably extract UI information from video game stream? - python

I'm trying to extract meaningful information from video streams of Super Smash Bros. for Wii U, a fighting game with a very sparse UI.
Example screenshot
From this I want to tell the number of players, their character names, and their current damage (the large percentage number). Everything I've tried so far has failed, because so few elements of the UI are static:
Some videos are within overlays and may be scaled and moved
Most matches contain 2 players, but may contain up to 8 players
Character portraits often fade out to nearly transparent,
Character names may be very short ('Ike') or very long ('Mr. Game & Watch'), so they overlap the edges of the triangle-shaped box they're in.
The box behind the character name varies in color, commonly, but not always, red and blue (in 2-player matches)
The game behind the UI is extremely noisy and may even be completely black or white at times.
The large number text changes from white to a red gradient as the value increases.
The large number text is completely absent when a player has been KOed.
I've tried the following things:
Template matching. Even with (slow) multi-scale matching, the percent sign changes in position and color very often, requiring a low threshold for matching, in turn producing noisy results.
Trying to find the character name by thresholding and finding horizontally-connected contours. This fails when the background is very bright. It also often matched undesirable elements on the stream overlay.
Finding edges and contours to find the triangle-shaped background behind the player name. Again, it failed because the background is very noisy (often there is a red player with a red background with no discernible edge)
Feature matching. There are hundreds of possible portraits, and the character-name-text (which is relatively static) is very small so there are few features available for matching.
I don't have any formal training in computer vision tasks, so I'm not sure how to progress. It seems like this should be a relatively straightforward task given the elements are 2d and never rotate or skew, but I know that's a dangerous assumption to make.
If anyone could point me in the right direction, I'd really appreciate it. No language preference, but I have been using python.

Related

Is there a way to discern an object from the background with OpenCV?

I always wanted to have a device that, from a live camera feed, could detect an object, create a 3D model of it, and then identify it. It would work a lot like the Scanner tool from Subnautica. Imagine my surprise when I found OpenCV, a free-to-use computer vision tool for Python!
My first step is to get the computer to recognize that there is an object at the center of the camera feed. To do this, I found a Canny() function that could detect edges and display them as white lines in a black image, which should make a complete outline of the object in the center. I also used the floodFill() function to fill in the black zone between the white lines with gray, which would show that the computer recognizes that there is an object there. My attempt is in the following image.
The red dot is the center of the live video.
The issue is that the edge lines can have holes in them due to a blur between two colors, which can range from individual pixels to entire missing lines. As a result, the gray gets out and doesn't highlight me as the only object, and instead highlights the entire wall as well. Is there a way to fill those missing pixels in or is there a better way of doing this?

Welcome to SO and the exiting world of machine vision !
What you are describing is a very classical problem in the field, and not a trivial one at all. It depends heavily on the shape and appearance of what you define as the object of interest and the overall structure, homogeneity and color of the background. Remember, the computer has no concept of what an "object" is, the only thing it 'knows' is a matrix of numbers.
In your example, you might start out with selecting the background area by color (or hue, look up HSV). Everything else is your object. This is what classical greenscreening techniques do, and it only works with (a) a homogenous background, which does not share a color with your object and (b) a single or multiple not overlapping objects.
The problem with your edge based approach is that you won't get a closed edge safely, and deciding where the inside and outside of the object is might get tricky.
Advanced ways to do this would get you into Neural Network territory, but maybe try to get the basics down first.
Here are two links to tutorials on converting color spaces and extracting contours:
https://docs.opencv.org/4.x/df/d9d/tutorial_py_colorspaces.html
https://docs.opencv.org/3.4/d4/d73/tutorial_py_contours_begin.html
If you got that figured out, look into stereo vision or 3D imaging in general, and that subnautica scanner might just become reality some day ;)
Good luck !

How could we do temporal filtering for video processing?

I am reading the slides for temporal filtering in Computer vision (page 108) class and i am wondering how can we do temporal filtering for videos?
For example they say our data is a vide which is in XYT, whre X,Y are spatial domain and T is time.
"How could we create a filter that keeps sharp objects that move at some velocity (vx, vy)while blurring the rest?"
and they kinda drive the formula for that, but im confused how to apply it?
How can we do filtering in Fourie Domain , how should we apply that in general? can someone please help me how should i code it?

In that example, they're talking about a specific known speed. For example, if you know that a car is moving left at 2 pixels per frame. It's possible to make a video that blurs everything except that car.
Here's the idea: start at frame 0 of the video. At each pixel, look one frame in the future, and 2 pixels left. You will be looking at the same part of the moving car. Now, imagine you take the average color value between your current pixel & the future pixel (the one that is 2 pixels left, and 1 frame in the future). If your pixel is on the moving car, both pixels will be the exact same color, so taking the average has no effect. On the other hand, if it's NOT on the moving car, they'll be different colors, and so take the average will have the effect of blurring between them.
Thus, the pixels of the car will be unchanged, but the rest of the video will get a blur. Repeat for each frame. You can also include more frames in your filter; e.g. you could look 2 frames in the future and 4 pixels left, or 1 frame in the past and 2 pixels right.
Note: this was a teaching example; I don't think there are many real computer vision applications for this (at least, not as a standalone technique), because it's so fragile. If the car speeds up or slows down slightly, it gets blurred.

Requesting Guidance: What approach to follow to do Quality Control on small and thin metal ring shafts using Computer Vision?

I am new to the Computer Vision field and looking for your guidance to identify approach to tackle the following scenario:
What approach to follow to do Quality Control on small and thin metal rings using Computer Vision
Putting below the detailed requirement(this is the best I can share):
To begin with, I have attached a picture of the ring we need to do QC of.
Ring_for_QC
Ring diameter = 3 inch
Following checks we need to do:
1.Surface coating of the ring peeled off
2.Portion of ring chipped off
3.Scratch on the ring's Surface
4.Width of the ring is uneven
5.Dent on the ring
6.Entire surface of the ring is not completely horizontal to the plane;
may be due to some dent a part of the ring is resting on the plane surface creating some 1 or 2 degree angle
(I have marked no.6 as 'uneven surface' in the attached picture)
I have also attached another picture marking the quality issues found on a random ring.elevated view with marked QC issues
Scenario:
One single ring can have one or more than one of the above mentioned 6 defects
Issue 1 & 3 can occur at either surface of the ring and we need to check both the surfaces
We need to QC on one single ring at a time
Challenge:
- Need to set up a work station to capture image or video of each ring under check
How many cameras will be there in that work station and what would be the angle for the camera
As we need to check both the sides of the ring we need to decide whether:
we will place the ring on a trasperent surface and take image
or
we need to flip the ring after image is taken on one side
Next challenge is what computer vision technique we should employ to identify all these issues
For the time being we are doing some research around opencv's background substraction methods
It will be helpful to get some insight from you on
what should be a better/feasible approach

Since this is for a student project I'll emphasize image processing more than other aspects of an application. See the bottom section for considerations for real-world applications.
That aside, a general comment: implementing vision for quality control (QC) is hard to get right. If the product to be inspected is cheap (e.g. a ring, a small plastic thing), and if the result of the vision inspection is a borderline pass/fail, or uncertain, you can reject the part. If the part to be inspected is expensive (e.g. a large assembly for a tractor, individual CPUs, medical devices near the end of the production line), then you have to have very well defined specifications, and the system needs to be made as robust as possible.
In general, you want to optimize imaging for each type of defect. For example, the camera location, lens, and lighting to detect scratches may be quite different than what is needed for dimensional gauging (a.k.a. dimensional measurement).
Machine Vision vs. Computer Vision
When you search online for algorithms, equipment, and techniques specific to vision for industrial automation, including the quality control of parts on production lines, then for English-language websites favor the term "machine vision" instead of "computer vision."
https://en.wikipedia.org/wiki/Machine_vision
Machine vision is the common industry term for image processing (+ cameras + lighting + ...) for industrial use. Although different people may use different terminology, and the terminology isn't as important as learning techniques, you'll find a lot of material by searching for "machine vision." The term "computer vision" tends to be used for non-industrial applications, and for academic research, though in languages other than English the terms "machine vision" and "computer vision" may be the same. By comparison, "medical imaging" is similar to machine vision, but involves application of image processing to medical applications.
Lighting
Most importantly, you must control the lighting. Ambient lighting, such as desk lamps, overhead lights, etc., are not only useless for a vision system inspecting parts in production, but will typically interfere with image processing. You might find some defects sometimes with poorly controlled light, but to generate the most consistent results, you'll need to set up lights in specific locations, run the lights at specific, verifiable intensities, and have your vision system detect when something has gone wrong with the lighting.
There are "machine vision lights" designed especially for specific applications such as finding scratches in shiny surfaces, making shiny surfaces look less shiny, to backlight parts (which is useful for dimensional gauging), to illuminate parts from low angles, and so on. Read about different types of lighting.
https://smartvisionlights.com/
https://www.vision-systems.com/content/dam/VSD/solutionsinvision/Resources/lighting_tips_white_paper.pdf
Rather than spend a lot of money on special lights, you can mock them up:
LED flashlight or single LED (as a "point" light source)
Bright light + translucent sheet of plastic (for backlighting)
White tissue paper or some other diffusing material in front of a bright light
...
The importance of lighting can not be underestimated. Controlling lighting conditions improves the chance of success, and is typically necessary to achieve the accuracy of measurement or pass/fail assessment required in real-world environments.
Accuracy, Correctness, Usefulness
At some point you'll probably wonder whether machine learning is useful or necessary for the application. The question to ask yourself (or the customer) is this: what percentage of defects would need to be detected?
For example, if a chip is missing from the ring that could be a fatal defect. Is the ring used in some safety-critical application? If so, vision inspection for QC would have to be extraordinarily robust.
Even if you're familiar with the terms "accuracy" and "precision," make sure they have very clear meanings as you consider image processing problems:
https://en.wikipedia.org/wiki/Accuracy_and_precision
So, what percentage of chip defects needs to be found? 90%? 95%? 98%?
Using the term "accurate" more loosely to mean "the vision system gets the measurement correct and/or finds the defects we know are there," what is the accuracy of the most accurate machine learning algorithm you've read about? Or at least, what would qualify as reasonably impressive accuracy for machine learning? 95%? 98%?
If you're making measurements of machine parts on a production line, then you would typically want the accuracy of dimensional measurements and defect detection to be 99% or better. For high-value products, and products such as electronic components that are highly sensitive to defects, accuracy may need to be 99.999% or better. Think of it this way: if a manufacturer is making thousands or tens of thousands of parts, they don't want garbage parts to make it past your vision system several times a day.
Machine learning for image processing has been around a long time. Processing speeds, memory, and training set sizes have improved, and there have been improvements in algorithms as well, but it's important to note that machine learning is suitable only for some applications, and will fail miserably at other applications.
Techniques
To begin with, I have attached a picture of the ring we need to do QC
of.
Ring_for_QC
Ring diameter = 3 inch
Get the exact diameter, including tolerances. If the nominal diameter is 3.000 inches, then then tolerance might be expressed in terms of thousands of an inch. You may not need to know that for a student project, but if you were proposing a solution for a factory owner you wouldn't want to even suggest a price or timeline for delivery without having complete specs for the part, and numerous samples of the part.
From the one image it's not possible to be too specific about what a defect might look like--the same part can have different defects in different factories, or even on different production lines of the same factory--but we can make some guesses.
1.Surface coating of the ring peeled off
From the one image it's not clear what the surface coating is supposed to look like, or what's underneath. You must provide at least one image of a good part, and at least one image for each type of defect.
What is the surface coating? Anodization? Paint? Enamel? Plastic? Cheese? Whatever the case, knowing what material it is, and how that material degrades, will give some clues about what sort of vision setup may help detect problems with the coating. Changes in coating quality can affect apparent texture (e.g. edge content), brightness/darkness (intensity), color, shininess, and so on.
For the moment, let's assume the coating peeling off changes the brightness or texture of the uncoated surface vs. the remaining coated surface. Then your image processing might look something like the following:
Determine whether a ring is in the image
Segment the ring from the background. That is, use an algorithm such as connected components (OpenCV's findContours()), SIFT, or some other technique to identify the presence and location of a rigid object of known size and shape from the background.
Isolate further processing to just those pixels corresponding to the surface of the part.
Use some technique to find clusters of different texture differences, brightness differences, etc. This is where a better description of the coating is required. If lighting and lens parameters are "fixed," you can consider generating a histogram of brightness values in the image (0 = black, 255 = white) and then comparing the histogram of good parts and bad parts--is there some statistical difference? Or you might use connected components (findContours() again) to cluster pixels of different colors, assuming the lack of coating changes the apparent color of the part: maybe the coating is brown and the part is silvery.
It's hard to guess what technique would be relevant here without photos and/or a much more specific description of the coating. Hopefully this makes it clear why specs are important.
Coatings can be absent in different ways: peeling, small absences (voids), partially scraped away, etc. It can be difficult to predict in advance what the shape and size of missing coating may be.
When the size and shape of a defect is hard to predict, but when the defect is associated with a difference in image intensity (pixel brightness) or color, then explore these ideas:
Generate an "edge image" in which you find brightness/color transitions. You start with the grayscale or color image, then use Sobel or Canny or some other algorithm to generate an image of edge intensities.
Apply statistical methods to determine how "edgy" an image is. Are there more than N pixels (or more than 5% of all pixels) with an edge strength greater than S?
Once you have some basic algorithm that identifies the difference between good parts and parts with some missing coating, then you could consider using machine learning to review lots (lots!) of samples to help determine the best parameterization. For example, how do you know what number of edge pixels or edge pixel strength should be considered "bad"?
2.Portion of ring chipped off
It depends on whether the chip is visible just from the part's outline. For example, if you placed the part on a light table (a.k.a. "backlight"), would you always see a defect considered to be a "chip"? Or could the chip just be on the top surface facing the camera?
To find chips on edges, having the part on a backlight simplifies matters greatly.
Identify the location and orientation of the part (e.g. using connect components, normalized correlation, SIFT, or whatever algorithm is suitable for the part and accuracy of location required).
Find edges corresponding to the outer and inner rings of the part.
Fit a circle or nearly circle ellipse to the edge points using Hough circle fit, RANSAC circle fit, or (meh) least square circle fit parameterized to the known dimensions (in pixels) of the outer ring and inner rind diameters.
For the points used for the circle fits, find the point-to-circle (or point-to-ellipse) shortest distance. The larger this distance, the more likely you have a chip or missing chunk.
To ensure you're finding identations, chips, or whatever, and not just individual "noise" edge points, examine points in order going clockwise or anticlockwise, and only consider a series of perimeter points as defects if N successive points have a median or possibly mean point-to-edge distance greater than N.
A simpler approach could be to fit a black-and-white mask--a template--representing a good part to the current location and rotation of the part to be inspected. If the template and sample part are aligned very precisely, and if you perform image subtraction, then you may be fortunate enough to get clusters or pixels where there are defects. But this method is fairly crude, and harder to make robust.
There are machine learning techniques to identify chips on edges, but you'd need lots of part samples to train the techniques. Optionally, if you don't have enough samples, you can use the sample samples with slightly modified lighting, at different locations in the image, with manually added defects, etc., to help train the algorithm. But that's another discussion altogether.
3.Scratch on the ring's Surface
See the link above about different types of lighting. You'll need to experiment with a few different lighting configurations to figure out what works for your part.
Generally, though, scratches are likely to have difference in brightness and "edginess" (image edge content) relative to the rest of the part. If you're lucky, a scratch can reveal a different color.
Scratches can vary so much in appearance, area, and shape that it would be hard to parameterize an algorithm to catch them all. Once again, statistical analysis of edge content, brightness, and color tends to be useful.
In general: to achieve the best results for a particular QC inspection, you'll need to engineer a system specifically for the part. Your vision system may be configurable, and there can be different combinations of lights and cameras for different types of QC inspection, but for any particular defect detection you want to control the appearance of the part as much as possible. Relying on software to do all the work yields a less robust system that customers will typically yank out and throw away.
4.Width of the ring is uneven
This is almost an example of dimensional gauging or optical gauging. If you're just looking for unevenness, you don't necessarily need to measurement diameter in engineering units such as millimeters: you can just measure pixels. BUT the effort required to ensure your measurement in pixels is accurate will typically lead you to measuring in millimeters anyway.
Assuming the optical setup is correct and (more or less) calibrated, which I'll describe below, here's a basic process:
Identify the position and location of the part
From the algorithm that find the part, or from a follow-on algorithm that identifies edge pixels (e.g. Sobel, Canny, ...), find the edge pixels just for the outer diameter of the ring.
Perform a circle/ellipse fit to the edge pixels, and eliminate outlier pixels that don't actually belong to the circle/ellipse.
Have your algorithm start with the 1st pixel in the list of edge pixels corresponding to the outer diameter.
From that 1st pixel, find the edge pixel farthest away. Ideally, this would be the point diametrically opposite.
Cycle through all pixels, finding the distance to the farthest pixel. (This is not optimal in terms of speed, but simpler to code.)
Generate a histogram of all distances.
Make a determination of good/bad based on the histogram of point-to-point distances.
You might call a part "bad" for one or more of the following conditions:
At least N point-to-point distances exceed a distance of P pixels
The standard deviation of point-to-point distances exceeds some threshold T
...
Measurement of distance depends on the consistency of point-to-point distances at different locations within the image. If you perform accurate, precise measurements of distance, you'll notice that an object of fixed length appears to vary in length depending on its location in the image: if the object is located in the center of the image it may appear to be 57.5 pixels long, but in one corner of the image it may appear to be 56.2 pixels long.
To correct for these irregularities, you can...
Perform a nonlinear flatness correction. This will also correct for non-normal alignment of the camera to part, though you want to start with the optical axis of the camera as normal (perpendicular) to the surface of the part as possib.e.
Make a few quick measurements to estimate how much measurements vary.
5.Dent on the ring
6.Entire surface of the ring is not completely horizontal to the plane; may be due to some dent a part of the ring is resting on the
plane surface creating some 1 or 2 degree angle (I have marked no.6 as
'uneven surface' in the attached picture)
Use cameras imaging from the sides. Make sure the background is simple.
A 1- to 2-degree difference could be hard to detect using a camera placed directed overhead. If you're lucky you could detect that the outer edge of the part is more elliptical than circular, but the ability to detect this would depend on the color and thickness of the part. Also, you wouldn't necessarily be able to distinguish between a misshapen part and one resting at an angle--but for some inspections that's okay since both are defects.
HOWEVER, in a real-world application the customer might not be happy if you reject parts that are otherwise good, but happen to be sitting at a slight angle. A mechanical fixture might fix the problem by ensure parts are lying flat.
I have also attached another picture marking the quality issues found
on a random ring.elevated view with marked QC issues
The image isn't clear enough. Put the part on a simpler background and tinker with lighting to make it more obvious what the differences are between good and bad.
One single ring can have one or more than one of the above mentioned 6 defects
Run one algorithm after the other. You may also have to turn different lights on and off before running each algorithm (or rather, each chain of algorithms).
Issue 1 & 3 can occur at either surface of the ring and we need to check both the surfaces
We need to QC on one single ring at a time
You may have to write an algorithm to detect whether multiple rings happen to be present. Even if you weren't asked to do this specifically, this happens in production, and your professor may surprise you with it. At least have an idea how you would detect the presence of multiple rings.
That's another aspect of vision: you may start thinking of what algorithms and lighting are necessary to solve "the problem," but you'll also spend a lot of time figuring out everything that could go wrong, and writing software to detect those conditions to ensure you don't yield a false result. For example, what happens if the lights turn off? What if two rings are present? What if the ring isn't fully within the field of view? What if dirt gets on the surface the part is resting on? What if the lens gets dirty (which it will)?
A few principles:
Provide the best image for image processing before you consider what algorithm would work best.
Understand what accuracy/success rate is necessary, and measure it.
Get as many samples as you possibly can: hundreds, thousands if possible. Having a chance to measure "online" (in real production) is helpful.
Real-world applications
If it were a real-world application--that is, if you went into the field of vision professionally--there are many more steps that may seem less difficult, but that turn out to be critical:
How rings come into view (or into "station"): on a moving conveyor? placed by a robot? in some container?
What triggers vision inspection of the ring -- a programmable logic controller, a "light curtain" the ring passes through, or whether the vision system itself has to determine when a ring is ready for inspection.
How results are communicated to other equipment. (This can be a huge hassle, and an otherwise good vision system can be rejected by a customer if communications aren't designed and implemented properly.)
Whether you are guaranteed to see only one ring at a time
This isn't to say university isn't the real world: just that you probably won't lose tens or hundreds of thousands of Euros/pounds/dollars if you happen to overlook something.

You can see how to makes face recognition.
Face detection.
Face alignment and normalization.
Features extraction.
Comparing features with pattern.
But in your case, you can skip paragraph 3 and compare 2 with the reference image. Depending on the conditions, additional filtering may be necessary.

How to detect hologram overlays like the ones in ID Cards?

Is there any good way to detect the holograms inside security documents like identity cards? I've tried quite a few methods such as sobel filter, laplacian, among others but its still pretty hard to tell if the card has a hologram over it.
Original Image
From left to right: Laplacian, SobelX, SobelY

What makes a hologram different from the normal print is that it looks different from different angles. It also looks different under different lighting.
I would try to take two pictures with the light coming from different sides. (Or turn the card 180 degrees). Then adjust the background and subtract the two images.
If this is for a mobile application (aka smart phone), the camera needs to take pictures from different angles. The application would have to take sample images while the user moves the phone around the card. It detects the card outline, maps it to a rectangle, and then attempts to substract images until the holograms are found. Apparently the reduced mechanical effort is translated into significantly more complicated software.

OpenCV - detecting missing coins from a tray using a live camera feed

I am building a system which detects coins that are picked up from a tray. This tray will be kept in a public place. People will pick up one or more coins, but would be expected to keep them back after some time.
I would have a live stream through a webcam placed at the top. I will have a calibration step, say at the beginning of the day, that captures the initial state of the tray to be used for comparing with the live feed. A few slots might be empty to begin with, as you can see in the sample image.
I need to detect slots that had a coin initially, but are missing the same at any given point of time during the day.
I am trying out a few approaches using OpenCV:
SSIM difference: I can use SSIM to find diff between my live image frame and initial state. However, a number of slots are larger than the corresponding coin sizes (e.g. top two rows). This could mean that if the coin was originally placed at the center, but was later put back to touch one of the edges, we may get a false positive.
Blob detection: Alternatively, I can pre-feed (or detect) slot co-ordinates. Then do a blob detection within every slot. If a blob was present in the original state, but is missing in a camera frame, this would mean a coin has been picked up. However, accurate blob detection could be a challenge if the contrast between the coin and the tray is low.
I might also need to watch out for slight variations in lighting due to shadows of people moving around.
Any thoughts on these or any pointers on alternate approaches that can be tried out? Is there any analogous implementation that I can learn from?
Many thanks in advance.
Edit: Thanks to #I.Newton's suggestion. For those who stumble upon this question and would benefit from a sample implementation, look here: https://github.com/kewats/computer-vision-samples/tree/master/image-processing/missing-coins-detection

If you complete control over the lighting conditions, you can use simple color thresholding to solve the problem.
First make a mask for the boxes. You can do it in multiple ways by color threshold or by using adaptive threshold or canny edge etc. I did by color threshold
Then make a mask for the coins by the same method.
Now flood fill your box mask from from the center of each of this coins. It'll retain only those which do not have the coins.
Now you can compare this with your initial mask to figure out if all the coins are present
This does not include frame subtraction. So you need not worry about different position of coin in the box. Only thing you need to make sure is the lighting conditions for making the masks. If you want to make sure the coins are returned to the same box, you should go for template matching etc which again needs effort.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.