Robust Hand Detection via Computer Vision

Robust Hand Detection via Computer Vision - python

I am currently working on a system for robust hand detection.
The first step is to take a photo of the hand (in HSV color space) with the hand placed in a small rectangle to determine the skin color. I then apply a thresholding filter to set all non-skin pixels to black and all skin pixels white.
So far it works quite well, but I wanted to ask if there is a better way to solve this? For example, I found a few papers mentioning concrete color spaces for caucasian people, but none with a comparison for asian/african/caucasian color-tones.
By the way, I'm working with OpenCV via Python bindings.

Have you taken a look at the camshift paper by Gary Bradski? You can download it from here
I used the the skin detection algorithm a year ago for detecting skin regions for hand tracking and it is robust. It depends on how you use it.
The first problem with using color for tracking is that it is not robust to lighting variations or like you mentioned, when people have different skin tones. However this can be solved easily as mentioned in the paper by:
Convert image to HSV color space.
Throw away the V channel and consider the H and S channel and hence
discount for lighting variations.
Threshold pixels with low saturation due to their instability.
Bin the selected skin region into a 2D histogram. (OpenCV"s calcHist
function) This histogram now acts as a model for skin.
Compute the "backprojection" (i.e. use the histogram to compute the "probability"
that each pixel in your image has the color of skin tone) using calcBackProject. Skin
regions will have high values.
You can then either use meanShift to look for the mode of the 2D
"probability" map generated by backproject or to detect blobs of
high "probability".
Throwing away the V channel in HSV and only considering H and S channels is really enough (surprisingly) to detect different skin tones and under different lighting variations. A plus side is that its computation is fast.
These steps and the corresponding code can be found in the original OpenCV book.
As a side note, I've also used Gaussian Mixture Models (GMM) before. If you are only considering color then I would say using histograms or GMM makes not much difference. In fact the histogram would perform better (if your GMM is not constructed to account for lighting variations etc.). GMM is good if your sample vectors are more sophisticated (i.e. you consider other features) but speed-wise histogram is much faster because computing the probability map using histogram is essentially a table lookup whereas GMM requires performing a matrix computation (for vector with dimension > 1 in the formula for multi-dimension gaussian distribution) which can be time consuming for real time applications.
So in conclusion, if you are only trying to detect skin regions using color, then go with the histogram method. You can adapt it to consider local gradient as well (i.e. histogram of gradients but possibly not going to the full extent of Dalal and Trigg's human detection algo.) so that it can differentiate between skin and regions with similar color (e.g. cardboard or wooden furniture) using the local texture information. But that would require more effort.
For sample source code on how to use histogram for skin detection, you can take a look at OpenCV"s page here. But do note that it is mentioned on that webpage that they only use the hue channel and that using both hue and saturation would give better result.
For a more sophisticated approach, you can take a look at the work on "Detecting naked people" by Margaret Fleck and David Forsyth. This was one of the earlier work on detecting skin regions that considers both color and texture. The details can be found here.
A great resource for source code related to computer vision and image processing, which happens to include code for visual tracking can be found here. And not, its not OpenCV.
Hope this helps.

Here is a paper on adaptive gaussian mixture model skin detection that you might find interesting.
Also, I remember reading a paper (unfortunately I can't seem to track it down) that used a very clever technique, but it required that you have the face in the field of view. The basic idea was detect the person's face, and use the skin patch detected from the face to identify the skin color automatically. Then, use a gaussian mixture model to isolate the skin pixels robustly.
Finally, Google Scholar may be a big help in searching for state of the art in skin detection. It's heavily researched in adademia right now as well as used in industry (e.g., Google Images and Facebook upload picture policies).

I have worked on something similar 2 years ago. You can try with Particle Filter (Condensation), using skin color pixels as input for initialization. It is quite robust and fast.
The way I applied it for my project is at this link. You have both a presentation (slides) and the survey.
If you initialize the color of the hand with the real color extracted from the hand you are going to track you shouldn't have any problems with black people.
For particle filter I think you can find some code implementation samples. Good luck.

It will be hard for you to find skin tone based on color only.
First of all, it depends strongly on the automatic white balance algorithm.
For example, in this image, any person can see that the color is skin tone. But for the computer it will be blue.
Second, correct color calibration in digital cameras is a hard thing, and it will be rarely accurate enough for your purposes.
You can see www.DPReview.com, to understand what I mean.
In conclusion, I truly believe that the color by itself can be an input, but it is not enough.

Well my experience with the skin modeling are bad, because:
1) lightning can vary - skin segmentation is not robust
2) it will mark your face also (as other skin-like objects)
I would use machine learning techniques like Haar training, which, in my opinion, if far more better approach than modeling and fixing some constraints (like skin detection + thresholding...)

As more robust then pixel colour you can use hand geometry model. First project model for particular gesture and the cross-correlate it with source image. Here is demo of this tchnique.

Related

Edge detection Gray image

i need some advice in a computer vision projekt that i am working on. I am trying to extract a corner in the image below. The edge im searching for is marked yellow in the right image. The edge detection is always failing because the edge is too blurred in the middle.
I run this process with opencv and python.
I started to remove the white dots with a threshold method. After that a big median blur (31-53). After that a adaptive Threshod method to seperate the areas left and right from the corners. But the sepearation is always bad because the edge is barely visible.
Is there some other way to extract this edge or do i have to try with a better camera?
Thanks for your help.

First do you have other dataset? because it is hard to discuss it just from 1 input.
Couple things that you can do.
The best is you change the camera of imaging technique to have a better and clear edge.
When it is hard to do so. Try model-based fitting.If you image is repeatable in all class. I can observe some circles on the right and 2 sharp straight-line edges on the left. Your wanted red soft edge circle is in the middle of those 2 apparent features. That can be considered as a model. then you can always use some other technique for the pixel in-between those 2 region(because they are easy to detect) . Those technique includes but not limit to histogram equalization, high pass filter or even wavelet transform.
The Wost way is to use parameter fitting to do. What you want to segment is sth not a strong edge and sth not a smooth plane. So you can tweak the canny edge detect to find those edge which is not so strong. I do not support this method. If you really no choice and no other image, then you can try it.
Last way is to use deep learning based method to train and auto segment this part out. This method might work. but it needs you to have hundred if not thousands of dataset and labels.
Regards
Shenghai Yuan

Convert Edge Detection to Mask

Giving an image that I applied an edge detection filter, what would be the way (hopefully efficient/performant one) to achieve a mask of the "sum" of the point in a marked segment?
Image for illustration:
Thank you in advance.
UPDATE:
Added example of a lighter image (https://imgur.com/a/MN0t3pH).
As you'll see in the below image, we assume that when the user marks a region (ROI), there will be an object that will "stand out" from its background. Our end goal is to get the most accurate "mask" of this object, so we can use it for ML processing.

From the two examples you've uploaded I can assume you are thresholding based on difference in color/intensity- I can suggest grabcut as basic foreground separation- use the edges in the mask in that ROI as input to the algorithm.
Even better- if your thresholding is good as the first image, just skip the edge detection part and this will be the input to grabcut.
======= EDIT =======
#RoiMulia if you need production level I can suggest that you leave the threshold + edge detection direction completly and try background removal techniques (SOTA are currently neural networks such as Background Matting: The World is Your Green Screen (example)).
You can also try some ready made background removal APIs such as https://www.remove.bg/ or https://clippingmagic.com/

1.
Given the "ROI" supervision you have, I strongly recommend you to explore GrabCut (as proposed by YoniChechnik):
Rother C, Kolmogorov V, Blake A. "GrabCut" interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG). 2004.
To get a feeling of how this works, you can use power-point's "background removal" tool:
which is based on GrabCut algorithm.
This is how it looks in power point:
GrabCut segment the foreground object in a selected ROI mainly based on its foreground/background color distributions, and less on edge/boundary information though this extra information can be integrated into the formulation.
It seems like opencv has a basic implementation of GrabCut, see here.
2.
If you are seeking a method that uses only the boundary information, you might find this answer useful.
3.
An alternative method is to use NCuts:
Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence. 2000.
If you have very reliable edge map, you can modify the "affinity matrix" NCuts works with to be a binary matrix
0 if there is a boundary between i and j
w_ij = 1 if there is no boundary between i and j
0 if i and j are not neighbors of each other
NCuts can be viewed as a way to estimate "robust connected components".

image comparison using edges and textures

I am working with python and opencv on a piece of software which should compare two images and return as result a value representing their similarity.
I tried first with histograms, and then with SIFT and SURF but the first method is not localized while the second and the third are slow and do not fit very much with my datased content (mostly pictures of crowds).
I would avoid people detector, so I would like to apply some algorithm connected to edges and textures comparison. Cany you give some hints or online resource?

This is an interesting, although challenging problem! Recently, I came across an article by the University of California, San Diego's Vision Group about classifying scenes of crowds. Here is the link: Urban Tribes: Analyzing Group Photos from a Social Perspective.
As you can see, there is no one-size-fits-all solution, but I would think that this should provide you a good place to start from.

What you're asking is a general image classification framework.
Try googling: image classification, scene classification, image Indexing and Retrieval.
In most cases, you'll have to use a multimodal descriptor. Use color, texture, entropy, keypoints, edge histograms.
You can read this and try that.

How do I find Wally with Python?

Shamelessly jumping on the bandwagon :-)
Inspired by How do I find Waldo with Mathematica and the followup How to find Waldo with R, as a new python user I'd love to see how this could be done. It seems that python would be better suited to this than R, and we don't have to worry about licenses as we would with Mathematica or Matlab.
In an example like the one below obviously simply using stripes wouldn't work. It would be interesting if a simple rule based approach could be made to work for difficult examples such as this.
I've added the [machine-learning] tag as I believe the correct answer will have to use ML techniques, such as the Restricted Boltzmann Machine (RBM) approach advocated by Gregory Klopper in the original thread. There is some RBM code available in python which might be a good place to start, but obviously training data is needed for that approach.
At the 2009 IEEE International Workshop on MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP 2009) they ran a Data Analysis Competition: Where's Wally?. Training data is provided in matlab format. Note that the links on that website are dead, but the data (along with the source of an approach taken by Sean McLoone and colleagues can be found here (see SCM link). Seems like one place to start.

Here's an implementation with mahotas
from pylab import imshow
import numpy as np
import mahotas
wally = mahotas.imread('DepartmentStore.jpg')
wfloat = wally.astype(float)
r,g,b = wfloat.transpose((2,0,1))
Split into red, green, and blue channels. It's better to use floating point arithmetic below, so we convert at the top.
w = wfloat.mean(2)
w is the white channel.
pattern = np.ones((24,16), float)
for i in xrange(2):
pattern[i::4] = -1
Build up a pattern of +1,+1,-1,-1 on the vertical axis. This is wally's shirt.
v = mahotas.convolve(r-w, pattern)
Convolve with red minus white. This will give a strong response where the shirt is.
mask = (v == v.max())
mask = mahotas.dilate(mask, np.ones((48,24)))
Look for the maximum value and dilate it to make it visible. Now, we tone down the whole image, except the region or interest:
wally -= .8*wally * ~mask[:,:,None]
imshow(wally)
And we get !

You could try template matching, and then taking down which produced the highest resemblance, and then using machine learning to narrow it more. That is also very difficult, and with the accuracy of template matching, it may just return every face or face-like image. I am thinking you will need more than just machine learning if you hope to do this consistently.

maybe you should start with breaking the problem into two smaller ones:
create an algorithm that separates people from the background.
train a neural network classifier with as many positive and negative examples as possible.
those are still two very big problems to tackle...
BTW, I would choose c++ and open CV, it seems much more suited for this.

This is not impossible but very difficult because you really have no example of a successful match. There are often multiple states(in this case, more examples of find walleys drawings), you can then feed multiple pictures into an image reconization program and treat it as a hidden markov model and use something like the viterbi algorithm for inference ( http://en.wikipedia.org/wiki/Viterbi_algorithm ).
Thats the way I would approach it, but assuming you have multiple images that you can give it examples of the correct answer so it can learn. If you only have one picture, then I'm sorry there maybe another approach you need to take.

I recognized that there are two main features which are almost always visible:
the red-white striped shirt
dark brown hair under the fancy cap
So I would do it the following way:
search for striped shirts:
filter out red and white color (with thresholds on the HSV converted image). That gives you two mask images.
add them together -> that's the main mask for searching striped shirts.
create a new image with all the filtered out red converted to pure red (#FF0000) and all the filtered out white converted to pure white (#FFFFFF).
now correlate this pure red-white image with a stripe pattern image (i think all the waldo's have quite perfect horizontal stripes, so rotation of the pattern shouldn't be necessary). Do the correlation only inside the above mentioned main mask.
try to group together clusters which could have been resulted from one shirt.
If there are more than one 'shirts', to say, more than one clusters of positive correlation, search for other features, like the dark brown hair:
search for brown hair
filter out the specific brown hair color using the HSV converted image and some thresholds.
search for a certain area in this masked image - not too big and not too small.
now search for a 'hair area' that is just above a (before) detected striped shirt and has a certain distance to the center of the shirt.

Here's a solution using neural networks that works nicely.
The neural network is trained on several solved examples that are marked with bounding boxes indicating where Wally appears in the picture. The goal of the network is to minimize the error between the predicted box and the actual box from training/validation data.
The network above uses Tensorflow Object Detection API to perform training and predictions.

Should I do Material Clasification/ Recognition using histograms or some other higher Math tool, like Bayesian Networks?

I'm learning the basics of OpenCV, and I thought a good project would help me make the studying more fun. After thinking some ideas I came up with some material recognition project. Let's say, I got myself a conveyor and it's transporting material for production of some product ( this product don't really matter, tho). There are 3 materials, and the illumination conditions will vary, (using natural light at the morning through the afternoon, and a light-bulb at night). That would be the problem description.
I was thinking of using sand, wood and rocks, which are easy to get. and place them on a plastic surface. After taking a pic, I'll apply some histogram to get the color, and using this color I'll identify the material. But, since the lightning conditions will change over time, when i take this photograph and apply the histogram, the color will change and the material won't be recognized properly. And I thought, what if I were to use sand and dust, they have very similar color, but different texture, is there something that can help me with that?
I just want some ideas, and maybe some expert in the field could guide me.

Quite an advanced idea for a starting project. The differences in lighting could be tackled by using the HSV or other color spaces, taking the Hue component. However the matter of "texture" can be handled in two ways:
Feature descriptors: If you deal with the grey level image, there are a set of feature descriptors called the Grey Level Co-occurrence Matrix (GLCM) that gives a measure of the textures of different regions in the image. This is present in Matlab, for OpenCV there is the following code: in C.
So you could take several standard shots of the sand, wood and rocks and use them as training samples on a classifier - NN, SVM, OpenCV's Haar classifier, whatever. Then train it with negative samples. The feature vector for the classifier will be the GLCM output for each picture. Then run it on the actual pictures and see how accurate they are.
Texture Roughness: Came across this useful paper that shows a single-valued measure for the 'roughness' of a texture called the Eigen Transform. The calculations are quite simple, especially if you use OpenCV's SVD() for eigenvalue calculations. The result of the Eigen-transform gives a value corresponding to the roughness of that portion. This can be used to separate out required portions.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.