I am building a system which detects coins that are picked up from a tray. This tray will be kept in a public place. People will pick up one or more coins, but would be expected to keep them back after some time.
I would have a live stream through a webcam placed at the top. I will have a calibration step, say at the beginning of the day, that captures the initial state of the tray to be used for comparing with the live feed. A few slots might be empty to begin with, as you can see in the sample image.
I need to detect slots that had a coin initially, but are missing the same at any given point of time during the day.
I am trying out a few approaches using OpenCV:
SSIM difference: I can use SSIM to find diff between my live image frame and initial state. However, a number of slots are larger than the corresponding coin sizes (e.g. top two rows). This could mean that if the coin was originally placed at the center, but was later put back to touch one of the edges, we may get a false positive.
Blob detection: Alternatively, I can pre-feed (or detect) slot co-ordinates. Then do a blob detection within every slot. If a blob was present in the original state, but is missing in a camera frame, this would mean a coin has been picked up. However, accurate blob detection could be a challenge if the contrast between the coin and the tray is low.
I might also need to watch out for slight variations in lighting due to shadows of people moving around.
Any thoughts on these or any pointers on alternate approaches that can be tried out? Is there any analogous implementation that I can learn from?
Many thanks in advance.
Edit: Thanks to #I.Newton's suggestion. For those who stumble upon this question and would benefit from a sample implementation, look here: https://github.com/kewats/computer-vision-samples/tree/master/image-processing/missing-coins-detection
If you complete control over the lighting conditions, you can use simple color thresholding to solve the problem.
First make a mask for the boxes. You can do it in multiple ways by color threshold or by using adaptive threshold or canny edge etc. I did by color threshold
Then make a mask for the coins by the same method.
Now flood fill your box mask from from the center of each of this coins. It'll retain only those which do not have the coins.
Now you can compare this with your initial mask to figure out if all the coins are present
This does not include frame subtraction. So you need not worry about different position of coin in the box. Only thing you need to make sure is the lighting conditions for making the masks. If you want to make sure the coins are returned to the same box, you should go for template matching etc which again needs effort.
Related
My experiment involves subjecting a substance to pressure that makes the substance eventually crack. The crack grows with time and pressure applied. I have a set-up to take a picture of the substance at fixed intervals of time.
I need to measure how fast crack grows.How do I go about this? (I can code in Python).
Is there a way to measure live speed or speed of growth of crack from one frame to another?
Google drive link to series of pictures taken - https://drive.google.com/open?id=189cv8B4rm3lhSgT6OYfI_aN0Xmqi-tYi
Kindly advise.
I Tried floodFill from OpenCV as per suggestions to this question. But the returned mask is as shown:
h, w = resized.shape[:2]
mask = np.zeros((h+2, w+2), np.uint8)
seed = (int(w/2),int(h/2))
# Floodfill from point (0, 0)
num,im,mask,rect = cv2.floodFill(resized, mask, (0,0), (255,0,0), (10,)*3, (10,)*3, floodflags)
I thought if I can get the co-ordinates of the rectangle bounding box that encloses the crack, I can track its co-ordinates across frames and measure the size of the crack and eventually the speed.
I tried thresholding as below:
th, im_th = cv2.threshold(im, 100, 255, cv2.THRESH_BINARY);
This gives:
I'm unsure if this will let me filter out the background and draw a bounding box over the crack alone. Please advise.
Thanks in advance.
Depending on how slowly the crack forms, you probably don't need a video; you'll likely wind up sampling every X frames anyway, and throwing all of the extra frames away. What you want is enough frames to get "incremental" changes in the crack without getting too many frames that it becomes too computationally expensive.
If you can carefully control the lighting conditions in your setup, then you're in luck! This becomes a very simple problem. You can take a histogram of the pixels (openCV has handles for this, but so does PIL and numpy); you should get two families of color; one that is the color of the outside of the substance, and another that is biased by the shadow in the crack.
You can also try dramatically increasing the contrast in each image/frame in order to get a binary mask of the crack, or running an edge detector over the image. These techniques will lead to frames that are substantially easier to process than the raw footage. You can even feed these into a skeletonization process in order to generate a vector-based representation of the line, in XY image coordinates.
If you can't control the lighting, or the sample is a similar color to the crack, you'll probably need to use object detection techniques, but it's unlikely there's an existing "crack detector," so you may either need to build your own, or look for what other detectors serve as a good proxy for the color and shape of the forming crack.
I'd highly recommend trying the first option if at all possible; pixel and histogram math is far easier than other techniques.
I appreciate you are only just getting started but you have some issues with your video. Firstly the lighting it is not best and it is not consistent because people are moving around in front of it and casting shadows - it also doesn't illuminate the the background behind the crack best - it would be better if it was at the height of the crack and shining more into it so that it better illuminates the background behind the crack. Secondly, you could do without the camera moving part way through the experiment!
Finally, if you want to measure things you need to calibrate, which at the very least means putting a ruler in the image - or scale lines on your background at fixed intervals. If you are doing all that you may as well make life easy for yourself and put markers of a specific colour/pattern, both different, on the top and bottom of the frame plates that are applying the load.
Finally then, you want to do something like a floodfill, or a fill just within the confines of your material (probably by masking) to fill the crack with a different colour. It is then pretty simple to measure the length of the crack and the left-most extent of the crack.
With a proper segmentation approach you are going to have a detailed geometry of the object extracted from a single frame. For example:
If you process multiple frames you will be able to see geometry evolution in time. Having that it should be easy to compare polygons to find form changes, cracks, etc:
I used to work with 4K video to get all required details and good accuracy. You might not need all that data, but video is still way more flexible.
Here is a complete example: https://youtu.be/g2KyfrBtTA4
Provide some examples if you want to get more detailed recommendations.
Update
Real examples are always helpful. So you can segment a crack:
or a substance:
or both:
Basically, you need to enhance overall quality of the input (focus, background under the substance, etc).
As Mark Setchell showed, you might get unwanted background as part of the result shape (the right side of the crack), so it is better to make sure that will not happen or just try to analyze only the substance.
Anyway, your task doesn't seem to be complex. It might be trivial if you can improve image quality and do some simplifications to the environment (some specific background, etc).
I am reading the slides for temporal filtering in Computer vision (page 108) class and i am wondering how can we do temporal filtering for videos?
For example they say our data is a vide which is in XYT, whre X,Y are spatial domain and T is time.
"How could we create a filter that keeps sharp objects that move at some velocity (vx, vy)while blurring the rest?"
and they kinda drive the formula for that, but im confused how to apply it?
How can we do filtering in Fourie Domain , how should we apply that in general? can someone please help me how should i code it?
In that example, they're talking about a specific known speed. For example, if you know that a car is moving left at 2 pixels per frame. It's possible to make a video that blurs everything except that car.
Here's the idea: start at frame 0 of the video. At each pixel, look one frame in the future, and 2 pixels left. You will be looking at the same part of the moving car. Now, imagine you take the average color value between your current pixel & the future pixel (the one that is 2 pixels left, and 1 frame in the future). If your pixel is on the moving car, both pixels will be the exact same color, so taking the average has no effect. On the other hand, if it's NOT on the moving car, they'll be different colors, and so take the average will have the effect of blurring between them.
Thus, the pixels of the car will be unchanged, but the rest of the video will get a blur. Repeat for each frame. You can also include more frames in your filter; e.g. you could look 2 frames in the future and 4 pixels left, or 1 frame in the past and 2 pixels right.
Note: this was a teaching example; I don't think there are many real computer vision applications for this (at least, not as a standalone technique), because it's so fragile. If the car speeds up or slows down slightly, it gets blurred.
Is there any good way to detect the holograms inside security documents like identity cards? I've tried quite a few methods such as sobel filter, laplacian, among others but its still pretty hard to tell if the card has a hologram over it.
Original Image
From left to right: Laplacian, SobelX, SobelY
What makes a hologram different from the normal print is that it looks different from different angles. It also looks different under different lighting.
I would try to take two pictures with the light coming from different sides. (Or turn the card 180 degrees). Then adjust the background and subtract the two images.
If this is for a mobile application (aka smart phone), the camera needs to take pictures from different angles. The application would have to take sample images while the user moves the phone around the card. It detects the card outline, maps it to a rectangle, and then attempts to substract images until the holograms are found. Apparently the reduced mechanical effort is translated into significantly more complicated software.
I'm trying to extract meaningful information from video streams of Super Smash Bros. for Wii U, a fighting game with a very sparse UI.
Example screenshot
From this I want to tell the number of players, their character names, and their current damage (the large percentage number). Everything I've tried so far has failed, because so few elements of the UI are static:
Some videos are within overlays and may be scaled and moved
Most matches contain 2 players, but may contain up to 8 players
Character portraits often fade out to nearly transparent,
Character names may be very short ('Ike') or very long ('Mr. Game & Watch'), so they overlap the edges of the triangle-shaped box they're in.
The box behind the character name varies in color, commonly, but not always, red and blue (in 2-player matches)
The game behind the UI is extremely noisy and may even be completely black or white at times.
The large number text changes from white to a red gradient as the value increases.
The large number text is completely absent when a player has been KOed.
I've tried the following things:
Template matching. Even with (slow) multi-scale matching, the percent sign changes in position and color very often, requiring a low threshold for matching, in turn producing noisy results.
Trying to find the character name by thresholding and finding horizontally-connected contours. This fails when the background is very bright. It also often matched undesirable elements on the stream overlay.
Finding edges and contours to find the triangle-shaped background behind the player name. Again, it failed because the background is very noisy (often there is a red player with a red background with no discernible edge)
Feature matching. There are hundreds of possible portraits, and the character-name-text (which is relatively static) is very small so there are few features available for matching.
I don't have any formal training in computer vision tasks, so I'm not sure how to progress. It seems like this should be a relatively straightforward task given the elements are 2d and never rotate or skew, but I know that's a dangerous assumption to make.
If anyone could point me in the right direction, I'd really appreciate it. No language preference, but I have been using python.
I want to use openCV to detect when a person raises or lowers a hand or both hands. I have looked through the tutorials provided by python opencv and none of them seem to do the job. There is a camera that sits in front of the 2 persons, about 50cm away from them(so you see them from the waist up). The person is able to raise or lower each arm, or both of the arms and I have to detect when they do that.(the camera is mounted on the bars of the rollercoaster; this implies that the background is always changing)
How can I detect this in the fastest time possible? It does not have to be real time detection but it does not have to be more than 0.5seconds. The whole image is 640x480. Now, since the hands can appear only in the top of the image, this would reduce the search area by half => 640x240. This would reduce to the problem of searching a certain object(the hands) in a constantly changing background.
Thank you,
Stefan F.
You can try the very basic but so effective and fast solution:
on the upper half of the image:
canny edge detection
morphologyEx with adequate Structuring element(also simple combination of erode/dilate may be enough)
convert to BW using adaptive threshold
Xor the result with a mask representing the expected covered area.
The number of ones returned by xor in each area of the mask is the index that you should use.
This is extremely fast, you can make more than one iteration within the 0.5 sec and use the average. also you may detect faces and use them to adapt the position of your mask, but this will be more expensive :)
hope that helps