I am working with python and opencv on a piece of software which should compare two images and return as result a value representing their similarity.
I tried first with histograms, and then with SIFT and SURF but the first method is not localized while the second and the third are slow and do not fit very much with my datased content (mostly pictures of crowds).
I would avoid people detector, so I would like to apply some algorithm connected to edges and textures comparison. Cany you give some hints or online resource?
This is an interesting, although challenging problem! Recently, I came across an article by the University of California, San Diego's Vision Group about classifying scenes of crowds. Here is the link: Urban Tribes: Analyzing Group Photos from a Social Perspective.
As you can see, there is no one-size-fits-all solution, but I would think that this should provide you a good place to start from.
What you're asking is a general image classification framework.
Try googling: image classification, scene classification, image Indexing and Retrieval.
In most cases, you'll have to use a multimodal descriptor. Use color, texture, entropy, keypoints, edge histograms.
You can read this and try that.
Related
Steps I have followed:
Background subtraction with preprocessing.
Contour detection.
With these two steps, I am able to draw contours on all moving cars in the video. But how do I track contours to count number of cars in the video ?
I searched around a bit and there seem to be different techniques like Kalman Filter, Lucas Kannade and Optical Flow... But I don't know which one to use for my usecase. I am using opencv3-python.
Actually, this seems like a general question, but I am going to give a point of view (Myself, I had the same problem but with pointclouds, although it may be different than what you asked, I hope it will give you an idea of how to proceed).
Most of the times, once your contours are detected, tracking moving objects in the scene involves 3 main steps:
Feature Matching :
This step is about detecting features in your object (Frame N) and match it to features of objects in frame (N+1). the detection part has some standard algorithms and descriptors available in OpenCV (SURF, SIFT, ORB...) as well as the Features matching part.
Kalman Filter
The Kalman filter is used to get an initial prediction (generally by applying a constant velocity model for your objects). For each appearance-point of the track, a correspondence search is executed. If the average distance is above a specified threshold, feature matching is applied to get a better initial estimate.
In order to do that, you need to model your problem in a way it can be solved by a Kalman filter.
Dynamic Mapping
After the motion estimation, the appearance of each track is updated. In contrast to standard mapping techniques, dynamic mapping is an approach which tries to accumulate appearance details of both static and dynamic objects. thus refining your motion estimation and tracking process.
There are a lot of papers out there, you may as well take a further look at these papers :
Robust Visual Tracking and Vehicle Classification via Sparse Representation
Motion Estimation from Range Images in Dynamic Outdoor Scenes
Multiple Objects Tracking using CAMshift Algorithm in OpenCV
Hope it helps !
I am trying to detect a vehicle in an image (actually a sequence of frames in a video). I am new to opencv and python and work under windows 7.
Is there a way to get horizontal edges and vertical edges of an image and then sum up the resultant images into respective vectors?
Is there a python code or function available for this.
I looked at this and this but would not get a clue how to do it.
You may use the following image for illustration.
EDIT
I was inspired by the idea presented in the following paper (sorry if you do not have access).
Betke, M.; Haritaoglu, E. & Davis, L. S. Real-time multiple vehicle detection and tracking from a moving vehicle Machine Vision and Applications, Springer-Verlag, 2000, 12, 69-83
I would take a look at the squares example for opencv, posted here. It uses canny and then does a contour find to return the sides of each square. You should be able to modify this code to get the horizontal and vertical lines you are looking for. Here is a link to the documentation for the python call of canny. It is rather helpful for all around edge detection. In about an hour I can get home and give you a working example of what you are wanting.
Do some reading on Sobel filters.
http://en.wikipedia.org/wiki/Sobel_operator
You can basically get vertical and horizontal gradients at each pixel.
Here is the OpenCV function for it.
http://docs.opencv.org/modules/imgproc/doc/filtering.html?highlight=sobel#sobel
Once you get this filtered images then you can collect statistics column/row wise and decide if its an edge and get that location.
Typically geometrical approaches to object detection are not hugely successful as the appearance model you assume can quite easily be violated by occlusion, noise or orientation changes.
Machine learning approaches typically work much better in my opinion and would probably provide a more robust solution to your problem. Since you appear to be working with OpenCV you could take a look at Casacade Classifiers for which OpenCV provides a Haar wavelet and a local binary pattern feature based classifiers.
The link I have provided is to a tutorial with very complete steps explaining how to create a classifier with several prewritten utilities. Basically you will create a directory with 'positive' images of cars and a directory with 'negative' images of typical backgrounds. A utiltiy opencv_createsamples can be used to create training images warped to simulate different orientations and average intensities from a small set of images. You then use the utility opencv_traincascade setting a few command line parameters to select different training options outputting a trained classifier for you.
Detection can be performed using either the C++ or the Python interface with this trained classifier.
For instance, using Python you can load the classifier and perform detection on an image getting back a selection of bounding rectangles using:
image = cv2.imread('path/to/image')
cc = cv2.CascadeClassifier('path/to/classifierfile')
objs = cc.detectMultiScale(image)
Shamelessly jumping on the bandwagon :-)
Inspired by How do I find Waldo with Mathematica and the followup How to find Waldo with R, as a new python user I'd love to see how this could be done. It seems that python would be better suited to this than R, and we don't have to worry about licenses as we would with Mathematica or Matlab.
In an example like the one below obviously simply using stripes wouldn't work. It would be interesting if a simple rule based approach could be made to work for difficult examples such as this.
I've added the [machine-learning] tag as I believe the correct answer will have to use ML techniques, such as the Restricted Boltzmann Machine (RBM) approach advocated by Gregory Klopper in the original thread. There is some RBM code available in python which might be a good place to start, but obviously training data is needed for that approach.
At the 2009 IEEE International Workshop on MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP 2009) they ran a Data Analysis Competition: Where's Wally?. Training data is provided in matlab format. Note that the links on that website are dead, but the data (along with the source of an approach taken by Sean McLoone and colleagues can be found here (see SCM link). Seems like one place to start.
Here's an implementation with mahotas
from pylab import imshow
import numpy as np
import mahotas
wally = mahotas.imread('DepartmentStore.jpg')
wfloat = wally.astype(float)
r,g,b = wfloat.transpose((2,0,1))
Split into red, green, and blue channels. It's better to use floating point arithmetic below, so we convert at the top.
w = wfloat.mean(2)
w is the white channel.
pattern = np.ones((24,16), float)
for i in xrange(2):
pattern[i::4] = -1
Build up a pattern of +1,+1,-1,-1 on the vertical axis. This is wally's shirt.
v = mahotas.convolve(r-w, pattern)
Convolve with red minus white. This will give a strong response where the shirt is.
mask = (v == v.max())
mask = mahotas.dilate(mask, np.ones((48,24)))
Look for the maximum value and dilate it to make it visible. Now, we tone down the whole image, except the region or interest:
wally -= .8*wally * ~mask[:,:,None]
imshow(wally)
And we get !
You could try template matching, and then taking down which produced the highest resemblance, and then using machine learning to narrow it more. That is also very difficult, and with the accuracy of template matching, it may just return every face or face-like image. I am thinking you will need more than just machine learning if you hope to do this consistently.
maybe you should start with breaking the problem into two smaller ones:
create an algorithm that separates people from the background.
train a neural network classifier with as many positive and negative examples as possible.
those are still two very big problems to tackle...
BTW, I would choose c++ and open CV, it seems much more suited for this.
This is not impossible but very difficult because you really have no example of a successful match. There are often multiple states(in this case, more examples of find walleys drawings), you can then feed multiple pictures into an image reconization program and treat it as a hidden markov model and use something like the viterbi algorithm for inference ( http://en.wikipedia.org/wiki/Viterbi_algorithm ).
Thats the way I would approach it, but assuming you have multiple images that you can give it examples of the correct answer so it can learn. If you only have one picture, then I'm sorry there maybe another approach you need to take.
I recognized that there are two main features which are almost always visible:
the red-white striped shirt
dark brown hair under the fancy cap
So I would do it the following way:
search for striped shirts:
filter out red and white color (with thresholds on the HSV converted image). That gives you two mask images.
add them together -> that's the main mask for searching striped shirts.
create a new image with all the filtered out red converted to pure red (#FF0000) and all the filtered out white converted to pure white (#FFFFFF).
now correlate this pure red-white image with a stripe pattern image (i think all the waldo's have quite perfect horizontal stripes, so rotation of the pattern shouldn't be necessary). Do the correlation only inside the above mentioned main mask.
try to group together clusters which could have been resulted from one shirt.
If there are more than one 'shirts', to say, more than one clusters of positive correlation, search for other features, like the dark brown hair:
search for brown hair
filter out the specific brown hair color using the HSV converted image and some thresholds.
search for a certain area in this masked image - not too big and not too small.
now search for a 'hair area' that is just above a (before) detected striped shirt and has a certain distance to the center of the shirt.
Here's a solution using neural networks that works nicely.
The neural network is trained on several solved examples that are marked with bounding boxes indicating where Wally appears in the picture. The goal of the network is to minimize the error between the predicted box and the actual box from training/validation data.
The network above uses Tensorflow Object Detection API to perform training and predictions.
I am currently working on a system for robust hand detection.
The first step is to take a photo of the hand (in HSV color space) with the hand placed in a small rectangle to determine the skin color. I then apply a thresholding filter to set all non-skin pixels to black and all skin pixels white.
So far it works quite well, but I wanted to ask if there is a better way to solve this? For example, I found a few papers mentioning concrete color spaces for caucasian people, but none with a comparison for asian/african/caucasian color-tones.
By the way, I'm working with OpenCV via Python bindings.
Have you taken a look at the camshift paper by Gary Bradski? You can download it from here
I used the the skin detection algorithm a year ago for detecting skin regions for hand tracking and it is robust. It depends on how you use it.
The first problem with using color for tracking is that it is not robust to lighting variations or like you mentioned, when people have different skin tones. However this can be solved easily as mentioned in the paper by:
Convert image to HSV color space.
Throw away the V channel and consider the H and S channel and hence
discount for lighting variations.
Threshold pixels with low saturation due to their instability.
Bin the selected skin region into a 2D histogram. (OpenCV"s calcHist
function) This histogram now acts as a model for skin.
Compute the "backprojection" (i.e. use the histogram to compute the "probability"
that each pixel in your image has the color of skin tone) using calcBackProject. Skin
regions will have high values.
You can then either use meanShift to look for the mode of the 2D
"probability" map generated by backproject or to detect blobs of
high "probability".
Throwing away the V channel in HSV and only considering H and S channels is really enough (surprisingly) to detect different skin tones and under different lighting variations. A plus side is that its computation is fast.
These steps and the corresponding code can be found in the original OpenCV book.
As a side note, I've also used Gaussian Mixture Models (GMM) before. If you are only considering color then I would say using histograms or GMM makes not much difference. In fact the histogram would perform better (if your GMM is not constructed to account for lighting variations etc.). GMM is good if your sample vectors are more sophisticated (i.e. you consider other features) but speed-wise histogram is much faster because computing the probability map using histogram is essentially a table lookup whereas GMM requires performing a matrix computation (for vector with dimension > 1 in the formula for multi-dimension gaussian distribution) which can be time consuming for real time applications.
So in conclusion, if you are only trying to detect skin regions using color, then go with the histogram method. You can adapt it to consider local gradient as well (i.e. histogram of gradients but possibly not going to the full extent of Dalal and Trigg's human detection algo.) so that it can differentiate between skin and regions with similar color (e.g. cardboard or wooden furniture) using the local texture information. But that would require more effort.
For sample source code on how to use histogram for skin detection, you can take a look at OpenCV"s page here. But do note that it is mentioned on that webpage that they only use the hue channel and that using both hue and saturation would give better result.
For a more sophisticated approach, you can take a look at the work on "Detecting naked people" by Margaret Fleck and David Forsyth. This was one of the earlier work on detecting skin regions that considers both color and texture. The details can be found here.
A great resource for source code related to computer vision and image processing, which happens to include code for visual tracking can be found here. And not, its not OpenCV.
Hope this helps.
Here is a paper on adaptive gaussian mixture model skin detection that you might find interesting.
Also, I remember reading a paper (unfortunately I can't seem to track it down) that used a very clever technique, but it required that you have the face in the field of view. The basic idea was detect the person's face, and use the skin patch detected from the face to identify the skin color automatically. Then, use a gaussian mixture model to isolate the skin pixels robustly.
Finally, Google Scholar may be a big help in searching for state of the art in skin detection. It's heavily researched in adademia right now as well as used in industry (e.g., Google Images and Facebook upload picture policies).
I have worked on something similar 2 years ago. You can try with Particle Filter (Condensation), using skin color pixels as input for initialization. It is quite robust and fast.
The way I applied it for my project is at this link. You have both a presentation (slides) and the survey.
If you initialize the color of the hand with the real color extracted from the hand you are going to track you shouldn't have any problems with black people.
For particle filter I think you can find some code implementation samples. Good luck.
It will be hard for you to find skin tone based on color only.
First of all, it depends strongly on the automatic white balance algorithm.
For example, in this image, any person can see that the color is skin tone. But for the computer it will be blue.
Second, correct color calibration in digital cameras is a hard thing, and it will be rarely accurate enough for your purposes.
You can see www.DPReview.com, to understand what I mean.
In conclusion, I truly believe that the color by itself can be an input, but it is not enough.
Well my experience with the skin modeling are bad, because:
1) lightning can vary - skin segmentation is not robust
2) it will mark your face also (as other skin-like objects)
I would use machine learning techniques like Haar training, which, in my opinion, if far more better approach than modeling and fixing some constraints (like skin detection + thresholding...)
As more robust then pixel colour you can use hand geometry model. First project model for particular gesture and the cross-correlate it with source image. Here is demo of this tchnique.
I'm learning the basics of OpenCV, and I thought a good project would help me make the studying more fun. After thinking some ideas I came up with some material recognition project. Let's say, I got myself a conveyor and it's transporting material for production of some product ( this product don't really matter, tho). There are 3 materials, and the illumination conditions will vary, (using natural light at the morning through the afternoon, and a light-bulb at night). That would be the problem description.
I was thinking of using sand, wood and rocks, which are easy to get. and place them on a plastic surface. After taking a pic, I'll apply some histogram to get the color, and using this color I'll identify the material. But, since the lightning conditions will change over time, when i take this photograph and apply the histogram, the color will change and the material won't be recognized properly. And I thought, what if I were to use sand and dust, they have very similar color, but different texture, is there something that can help me with that?
I just want some ideas, and maybe some expert in the field could guide me.
Quite an advanced idea for a starting project. The differences in lighting could be tackled by using the HSV or other color spaces, taking the Hue component. However the matter of "texture" can be handled in two ways:
Feature descriptors: If you deal with the grey level image, there are a set of feature descriptors called the Grey Level Co-occurrence Matrix (GLCM) that gives a measure of the textures of different regions in the image. This is present in Matlab, for OpenCV there is the following code: in C.
So you could take several standard shots of the sand, wood and rocks and use them as training samples on a classifier - NN, SVM, OpenCV's Haar classifier, whatever. Then train it with negative samples. The feature vector for the classifier will be the GLCM output for each picture. Then run it on the actual pictures and see how accurate they are.
Texture Roughness: Came across this useful paper that shows a single-valued measure for the 'roughness' of a texture called the Eigen Transform. The calculations are quite simple, especially if you use OpenCV's SVD() for eigenvalue calculations. The result of the Eigen-transform gives a value corresponding to the roughness of that portion. This can be used to separate out required portions.