Detect Hand using OpenCV

Detect Hand using OpenCV - python

I want to use openCV to detect when a person raises or lowers a hand or both hands. I have looked through the tutorials provided by python opencv and none of them seem to do the job. There is a camera that sits in front of the 2 persons, about 50cm away from them(so you see them from the waist up). The person is able to raise or lower each arm, or both of the arms and I have to detect when they do that.(the camera is mounted on the bars of the rollercoaster; this implies that the background is always changing)
How can I detect this in the fastest time possible? It does not have to be real time detection but it does not have to be more than 0.5seconds. The whole image is 640x480. Now, since the hands can appear only in the top of the image, this would reduce the search area by half => 640x240. This would reduce to the problem of searching a certain object(the hands) in a constantly changing background.
Thank you,
Stefan F.

You can try the very basic but so effective and fast solution:
on the upper half of the image:
canny edge detection
morphologyEx with adequate Structuring element(also simple combination of erode/dilate may be enough)
convert to BW using adaptive threshold
Xor the result with a mask representing the expected covered area.
The number of ones returned by xor in each area of the mask is the index that you should use.
This is extremely fast, you can make more than one iteration within the 0.5 sec and use the average. also you may detect faces and use them to adapt the position of your mask, but this will be more expensive :)
hope that helps

Related

How to find center coordinates of numbers in an image

I'm currently working on my first assignment in image processing (using OpenCV in Python). My assignment is to calculate a precise score (to tenths of a point) of one to several shooting holes in an image uploaded by a user. One of the requirements is to transform the uploaded shooting target image to be from "birds-eye view" for further processing. For that I have decided that I need to find center coordinates of numbers (7 & 8) to select them as my 4 quadrilateral.
Unfortunately, there are several limitations that need to be taken into account.
Limitations:
resolution of the processed shooting target image can vary
the image can be taken in different lighting conditions
the image processed by this part of my algorithm will always be taken under an angle (extreme angles will be automatically rejected)
the image can be slightly rotated (+/- 10 degrees)
the shooting target can be just a part of the image
the image can be only of the center black part of the target, meaning the user doesn't have to take a photo of the whole shooting target (but there always has to be the center black part on it)
this algorithm can take a maximum of 2000ms runtime
What I have tried so far:
Template matching
here I quickly realized that it was unusable since the numbers could be slightly rotated and a different scale
Feature matching
I have tried all of the different feature matching types (SIFT, SURF, ORB...)
unfortunately, the numbers do not have that specific set of features so they matched a quite lot of false positives, but I could possibly filter them by adding shape matching, etc..
the biggest blocker was runtime, the runtime of only a single number feature matching took around 5000ms (even after optimizations) (on MacBook PRO 2017)
Optical character recognition
I mostly tried using pytesseract library
even after thresholding the image to inverted binary (so the text of numbers 7 and 8 is black and the background white) it failed to recognize them
I also tried several ways of preprocessing the image and I played a lot with the tesseract config parameter but it didn't seem to help whatsoever
Contour detection
I have easily detected all of the wanted numbers (7 & 8) as single contours but failed to filter out all of the false positives (since the image can be in different resolutions and also there are two types of targets with different sizes of the numbers I couldn't simply threshold the contour by its width, height or area)
After I would detect the numbers as contours I wanted to extract them as some ROI and then I would use OCR on them (but since there were so many false positives this would take a lot of time)
I also tried filtering them by using cv2.matchShapes function on both contours and cropped template / ROI but it seemed really unreliable
Example processed images:
high resolution version here
high resolution version here
high resolution version here
high resolution version here
high resolution version here
high resolution version here
As of right now, I'm lost on how to progress about this. I have tried everything I could think of. I would be immensely happy if any of you image recognition experts gave me any kind of advice or even better a usable code example to help me solve my problem.
Thank you all in advance.

Find the black disk by adaptive binarization and contour (possibly blur to erase the inner features);
Fit an ellipse to the outline, as accurate as possible;
Find at least one edge of the square (Hough lines);
Classify the edge as one of NWSE (according to angle);
Use the ellipse and the line information to reconstruct the perspective transformation (it is an homography);
Apply the inverse homography to straighten the image and obtain the exact target center and axis;
Again by adaptive binarization, find the bullet holes (center/radius);
Rate the holes after their distance to the center, relative to the back disk radius.
If the marking scheme is variable, detect the circles (Hough circles, using the known center, or detect peaks in an oblique profile starting from the center).
If necessary, you could OCR the digits, but it seems that the score is implicitly starting at one in the outer ring.

Edge detection Gray image

i need some advice in a computer vision projekt that i am working on. I am trying to extract a corner in the image below. The edge im searching for is marked yellow in the right image. The edge detection is always failing because the edge is too blurred in the middle.
I run this process with opencv and python.
I started to remove the white dots with a threshold method. After that a big median blur (31-53). After that a adaptive Threshod method to seperate the areas left and right from the corners. But the sepearation is always bad because the edge is barely visible.
Is there some other way to extract this edge or do i have to try with a better camera?
Thanks for your help.

First do you have other dataset? because it is hard to discuss it just from 1 input.
Couple things that you can do.
The best is you change the camera of imaging technique to have a better and clear edge.
When it is hard to do so. Try model-based fitting.If you image is repeatable in all class. I can observe some circles on the right and 2 sharp straight-line edges on the left. Your wanted red soft edge circle is in the middle of those 2 apparent features. That can be considered as a model. then you can always use some other technique for the pixel in-between those 2 region(because they are easy to detect) . Those technique includes but not limit to histogram equalization, high pass filter or even wavelet transform.
The Wost way is to use parameter fitting to do. What you want to segment is sth not a strong edge and sth not a smooth plane. So you can tweak the canny edge detect to find those edge which is not so strong. I do not support this method. If you really no choice and no other image, then you can try it.
Last way is to use deep learning based method to train and auto segment this part out. This method might work. but it needs you to have hundred if not thousands of dataset and labels.
Regards
Shenghai Yuan

Measure the rate of growth of a crack from Video

My experiment involves subjecting a substance to pressure that makes the substance eventually crack. The crack grows with time and pressure applied. I have a set-up to take a picture of the substance at fixed intervals of time.
I need to measure how fast crack grows.How do I go about this? (I can code in Python).
Is there a way to measure live speed or speed of growth of crack from one frame to another?
Google drive link to series of pictures taken - https://drive.google.com/open?id=189cv8B4rm3lhSgT6OYfI_aN0Xmqi-tYi
Kindly advise.
I Tried floodFill from OpenCV as per suggestions to this question. But the returned mask is as shown:
h, w = resized.shape[:2]
mask = np.zeros((h+2, w+2), np.uint8)
seed = (int(w/2),int(h/2))
# Floodfill from point (0, 0)
num,im,mask,rect = cv2.floodFill(resized, mask, (0,0), (255,0,0), (10,)*3, (10,)*3, floodflags)
I thought if I can get the co-ordinates of the rectangle bounding box that encloses the crack, I can track its co-ordinates across frames and measure the size of the crack and eventually the speed.
I tried thresholding as below:
th, im_th = cv2.threshold(im, 100, 255, cv2.THRESH_BINARY);
This gives:
I'm unsure if this will let me filter out the background and draw a bounding box over the crack alone. Please advise.
Thanks in advance.

Depending on how slowly the crack forms, you probably don't need a video; you'll likely wind up sampling every X frames anyway, and throwing all of the extra frames away. What you want is enough frames to get "incremental" changes in the crack without getting too many frames that it becomes too computationally expensive.
If you can carefully control the lighting conditions in your setup, then you're in luck! This becomes a very simple problem. You can take a histogram of the pixels (openCV has handles for this, but so does PIL and numpy); you should get two families of color; one that is the color of the outside of the substance, and another that is biased by the shadow in the crack.
You can also try dramatically increasing the contrast in each image/frame in order to get a binary mask of the crack, or running an edge detector over the image. These techniques will lead to frames that are substantially easier to process than the raw footage. You can even feed these into a skeletonization process in order to generate a vector-based representation of the line, in XY image coordinates.
If you can't control the lighting, or the sample is a similar color to the crack, you'll probably need to use object detection techniques, but it's unlikely there's an existing "crack detector," so you may either need to build your own, or look for what other detectors serve as a good proxy for the color and shape of the forming crack.
I'd highly recommend trying the first option if at all possible; pixel and histogram math is far easier than other techniques.

I appreciate you are only just getting started but you have some issues with your video. Firstly the lighting it is not best and it is not consistent because people are moving around in front of it and casting shadows - it also doesn't illuminate the the background behind the crack best - it would be better if it was at the height of the crack and shining more into it so that it better illuminates the background behind the crack. Secondly, you could do without the camera moving part way through the experiment!
Finally, if you want to measure things you need to calibrate, which at the very least means putting a ruler in the image - or scale lines on your background at fixed intervals. If you are doing all that you may as well make life easy for yourself and put markers of a specific colour/pattern, both different, on the top and bottom of the frame plates that are applying the load.
Finally then, you want to do something like a floodfill, or a fill just within the confines of your material (probably by masking) to fill the crack with a different colour. It is then pretty simple to measure the length of the crack and the left-most extent of the crack.

With a proper segmentation approach you are going to have a detailed geometry of the object extracted from a single frame. For example:
If you process multiple frames you will be able to see geometry evolution in time. Having that it should be easy to compare polygons to find form changes, cracks, etc:
I used to work with 4K video to get all required details and good accuracy. You might not need all that data, but video is still way more flexible.
Here is a complete example: https://youtu.be/g2KyfrBtTA4
Provide some examples if you want to get more detailed recommendations.
Update
Real examples are always helpful. So you can segment a crack:
or a substance:
or both:
Basically, you need to enhance overall quality of the input (focus, background under the substance, etc).
As Mark Setchell showed, you might get unwanted background as part of the result shape (the right side of the crack), so it is better to make sure that will not happen or just try to analyze only the substance.
Anyway, your task doesn't seem to be complex. It might be trivial if you can improve image quality and do some simplifications to the environment (some specific background, etc).

OpenCV find subjective contours like the human eye does

When humans see markers suggesting the form of a shape, they immediately perceive the shape itself, as in https://en.wikipedia.org/wiki/Illusory_contours. I'm trying to accomplish something similar in OpenCV in order to detect the shape of a hand in a depth image with very heavy noise. In this question, assume that skin color based detection is not working (actually it is the best I've achieved so far but it is not robust under changing light conditions, shadows or skin colors. Also various paper shapes (flat and colorful) are on the table, confusing color-based approaches. This is why I'm attempting to use the depth cam instead).
Here's a sample image of the live footage that is already pre-processed for better contrast and with background gradient removed:
I want to isolate the exact shape of the hand from the rest of the picture. For a human eye this is a trivial thing to do. So here are a few attempts I did:
Here's the result with canny edge detection applied. The problem here is that the black shape inside the hand is larger than the actual hand, causing the detected hand to overshoot in size. Also, the lines are not connected and I fail at detecting contours.
Update: Combining Canny and a morphological closing (4x4 px ellipse) makes contour detection possible with the following result. It is still waaay too noisy.
Update 2: The result can be slightly enhanced by drawing that contour to an empty mask, save that in a buffer and re-detect yet another contour on a merge of three buffered images. The line that combines the buffered images is is hand_img = np.array(np.minimum(255, np.multiply.reduce(self.buf)), np.uint8) which is then morphed once again (closing) and finally contour detected. The results are slightly less horrible than in the picture above but laggy instead.
Alternatively I tried to use an existing CNN (https://github.com/victordibia/handtracking) for detecting the approximate position of the hand's center (this step works) and then flood from there. In order to detect contours the result is put into an OTSU filter and then the largest contour is taken, resulting in the following picture (ignore black rectangles in the left). The problem is that some of the noise is flooded as well and the results are mediocre:
Finally, I tried background removers such as MOG2 or GMG. They are confused by the enormous amount of fast-moving noise. Also they cut off the fingertips (which are crucial for this project). Finally, they don't see enough details in the hand (8 bit plus further color reduction via equalizeHist yield a very poor grayscale resolution) to reliably detect small movements.
It's ridiculous how simple it is for a human to see the exact precise shape of the hand in the first picture and how incredibly hard it is for the computer to draw a shape.
What would be your recommended method to achieve an exact hand segmentation?

After two days of desperate testing, the solution was to VERY carefully apply thresholding to an well-preprocessed image.
Here are the steps:
Remove as much noise as you possibly can. In my case, denoising was done using Intel's pyrealsense2 (I'm using an Intel RealSense depth camera and the algorithms were written for that camera family, thus they work very well). I used rs.temporal_filter() and directly after rs.hole_filling_filter() on every frame.
Capture the very first frame. Besides capturing the exact distance to the table (for later thresholding), this step also saves a still picture that is blurred by a 100x100 px kernel. Since the camera is never mounted perfectly but slightly tilted, there's an ugly grayscale gradient going over the picture and making operations impossible. This still picture is then subtracted from every single later frame, eliminating the gradient. BTW: this gradient removal step is already incorporated in the screenshots shown in the question above
Now the picture is almost noise-free. Do not use equalizeHist. This does not simply increase the general contrast regularly but instead empathizes the remaining noise way too much. This was my main error I did in almost all experiments. Instead, apply a threshold (binary with fixed border) directly. The border is extremely thin, setting it at 104 instead of 205 makes a huge difference.
Invert colors (unless you have taken BINARY_INV in the previous step), apply contours, take the largest one and write it to a mask
Voilà!

OpenCV - detecting missing coins from a tray using a live camera feed

I am building a system which detects coins that are picked up from a tray. This tray will be kept in a public place. People will pick up one or more coins, but would be expected to keep them back after some time.
I would have a live stream through a webcam placed at the top. I will have a calibration step, say at the beginning of the day, that captures the initial state of the tray to be used for comparing with the live feed. A few slots might be empty to begin with, as you can see in the sample image.
I need to detect slots that had a coin initially, but are missing the same at any given point of time during the day.
I am trying out a few approaches using OpenCV:
SSIM difference: I can use SSIM to find diff between my live image frame and initial state. However, a number of slots are larger than the corresponding coin sizes (e.g. top two rows). This could mean that if the coin was originally placed at the center, but was later put back to touch one of the edges, we may get a false positive.
Blob detection: Alternatively, I can pre-feed (or detect) slot co-ordinates. Then do a blob detection within every slot. If a blob was present in the original state, but is missing in a camera frame, this would mean a coin has been picked up. However, accurate blob detection could be a challenge if the contrast between the coin and the tray is low.
I might also need to watch out for slight variations in lighting due to shadows of people moving around.
Any thoughts on these or any pointers on alternate approaches that can be tried out? Is there any analogous implementation that I can learn from?
Many thanks in advance.
Edit: Thanks to #I.Newton's suggestion. For those who stumble upon this question and would benefit from a sample implementation, look here: https://github.com/kewats/computer-vision-samples/tree/master/image-processing/missing-coins-detection

If you complete control over the lighting conditions, you can use simple color thresholding to solve the problem.
First make a mask for the boxes. You can do it in multiple ways by color threshold or by using adaptive threshold or canny edge etc. I did by color threshold
Then make a mask for the coins by the same method.
Now flood fill your box mask from from the center of each of this coins. It'll retain only those which do not have the coins.
Now you can compare this with your initial mask to figure out if all the coins are present
This does not include frame subtraction. So you need not worry about different position of coin in the box. Only thing you need to make sure is the lighting conditions for making the masks. If you want to make sure the coins are returned to the same box, you should go for template matching etc which again needs effort.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.