Parameters of detectMultiScale in OpenCV using Python

Parameters of detectMultiScale in OpenCV using Python - python

I am not able to understand the parameters passed to detectMultiScale. I know that the general syntax is detectMultiScale(image, rejectLevels, levelWeights)
However, what do the parameters rejectLevels and levelWeights mean? And what are the optimal values used for detecting objects?
I want to use this to detect pupil of the eye

Amongst these parameters, you need to pay more attention to four of them:
scaleFactor – Parameter specifying how much the image size is reduced at each image scale.
Basically, the scale factor is used to create your scale pyramid. More explanation, your model has a fixed size defined during training, which is visible in the XML. This means that this size of the face is detected in the image if present. However, by rescaling the input image, you can resize a larger face to a smaller one, making it detectable by the algorithm.
1.05 is a good possible value for this, which means you use a small step for resizing, i.e. reduce the size by 5%, you increase the chance of a matching size with the model for detection is found. This also means that the algorithm works slower since it is more thorough. You may increase it to as much as 1.4 for faster detection, with the risk of missing some faces altogether.
minNeighbors – Parameter specifying how many neighbors each candidate rectangle should have to retain it.
This parameter will affect the quality of the detected faces. Higher value results in fewer detections but with higher quality. 3~6 is a good value for it.
minSize – Minimum possible object size. Objects smaller than that are ignored.
This parameter determines how small size you want to detect. You decide it! Usually, [30, 30] is a good start for face detection.
maxSize – Maximum possible object size. Objects bigger than this are ignored.
This parameter determines how big size you want to detect. Again, you decide it! Usually, you don't need to set it manually, the default value assumes you want to detect without an upper limit on the size of the face.

A code example can be found here:
http://docs.opencv.org/3.1.0/d7/d8b/tutorial_py_face_detection.html#gsc.tab=0
Regarding the parameter descriptions, you may have quoted old parameter definitions, in fact you may be faced with the following parameters:
scaleFactor: Parameter specifying how much the image size is reduced
at each image scale.
minNeighbors: Parameter specifying how many neighbors each candidate rectangle should have to retain it
Here you can find a nice explanation on these parameters:
http://www.bogotobogo.com/python/OpenCV_Python/python_opencv3_Image_Object_Detection_Face_Detection_Haar_Cascade_Classifiers.php
Make sure to obtain proper pretrained classifier sets for faces and eyes such as
haarcascade_frontalface_default.xml
haarcascade_eye.xml

The OpenCV Class List docs provides the descriptions for all C++ and Python method.
Here is the one for cv::CascadeClassifier detectMultiScale:
detectMultiScale
Python:
objects = cv.CascadeClassifier.detectMultiScale(image[, scaleFactor[, minNeighbors[, flags[, minSize[, maxSize]]]]]
Parameters:
image Matrix of the type CV_8U containing an image where objects
are detected.
objects Vector of rectangles where each rectangle contains the
detected object, the rectangles may be partially outside
the original image.
scaleFactor Parameter specifying how much the image size is reduced
at each image scale.
minNeighbors Parameter specifying how many neighbors each candidate
rectangle should have to retain it.
flags Parameter with the same meaning for an old cascade as in
the function cvHaarDetectObjects. It is not used for a
new cascade.
minSize Minimum possible object size. Objects smaller than that
are ignored.
maxSize Maximum possible object size. Objects larger than that
are ignored. If maxSize == minSize model is evaluated
on single scale.
Note
(Python) A face detection example using cascade classifiers can be found at opencv_source_code/samples/python/facedetect.py
As noted, a sample usage is available from the OpenCV source code. You can pass in each documented parameter as a keyword.
rects = cascade.detectMultiScale(img,
scaleFactor=1.3,
minNeighbors=4,
minSize=(30, 30),
flags=cv.CASCADE_SCALE_IMAGE)

detectMultiScale function is used to detect the faces. This function will return a rectangle with coordinates(x,y,w,h) around the detected face.
It takes 3 common arguments — the input image, scaleFactor, and minNeighbours.
scaleFactor specifies how much the image size is reduced with each scale. In a group photo, there may be some faces which are near the camera than others. Naturally, such faces would appear more prominent than the ones behind. This factor compensates for that.
minNeighbours specifies how many neighbours each candidate rectangle should have to retain it. You can read about it in detail here. You may have to tweak these values to get the best results. This parameter specifies the number of neighbours a rectangle should have to be called a face.
We obtain these values after trail and test over a specific range.

Related

Skewing text - How to take advantage of existing edges

I have the following JPG image. If I want to find the edges where the white page meets the black background. So I can rotate the contents a few degrees clockwise. My aim is to straighten the text for using with Tesseract OCR conversion. I don't see the need to rotate the text blocks as I have seen in similar examples.
In the docs Canny Edge Detection the third arg 200 eg edges = cv.Canny(img,100,200) is maxVal and said to be 'sure to be edges'. Is there anyway to determine these (max/min) values ahead of any trial & error approach?
I have used code examples which utilize the Python cv2 module. But the edge detection is set up for simpler applications.
Is there any approach I can use to take the text out of the equation. For example: only detecting edge lines greater than a specified length?
Any suggestions would be appreciated.
Below is an example of edge detection (above image same min/max values) The outer edge of the page is clearly defined. The image is high contrast b/w. It has even lighting. I can't see a need for the use of an adaptive threshold. Simple global is working. Its just at what ratio to use it.
I don't have the answer to this yet. But to add. I now have the contours of the above doc.
I used find contours tutorial with some customization of the file loading. Note: removing words gives a thinner/cleaner outline.

Consider Otsu.
Its chief virtue is that it is adaptive to local
illumination within the image.
In your case, blank margins might be the saving grace.
Consider working on a series of 2x reduced resolution images,
where new pixel is min() (or even max()!) of original four pixels.
These reduced images might help you to focus on the features
that matter for your use case.
The usual way to deskew scanned text is to binarize and
then keep changing theta until "sum of pixels across raster"
is zero, or small. In particular, with few descenders
and decent inter-line spacing, we will see "lots" of pixels
on each line of text and "near zero" between text lines,
when theta matches the original printing orientation.
Which lets us recover (1.) pixels per line, and (2.) inter-line spacing, assuming we've found a near-optimal theta.
In your particular case, focusing on the ... leader dots
seems a promising approach to finding the globally optimal
deskew correction angle. Discarding large rectangles of
pixels in the left and right regions of the image could
actually reduce noise and enhance the accuracy of
such an approach.

Detecting objects in images using SciKit with python

I have an immage processing problem that im struggling to figure out a solution for the image. here is the image. Basically its a segmentation and counting problem using scikit image in python. Basically i have to write a psudeo code of how i would go about counting these "rectangle" objects in a source image that i have. The rectangles are surrounded by other different objects of different shape and size. Recently i have done a similar beginner problem to count the number of coins in an image. this one was much easier because all of the objects were of the same nature.
Could any of you help me with ideas of how to go about counting the scissors, seperating and isolating them from all of the other objects in the image. My thought process so far is to
read in image
convert to grayscale
plot a histogram
from this threshold preferqbley using otsu
remove all unwanted objects that touc border using skimage clear_border
however unlike the coins which are simple and all nearly identical i dont know how to go about isolating the rectangle objects. Is there any advanced segmentaqion techniques in skimage that could be used for this. Like i was thinking of blob but i dont think that will work here. If anyone could provide any insight please let me know i would be very grateful

It depends how general you need your solution to be. In the image you showed, the scissors are the only objects that have two holes in them. We can use the skimage.measure.regionprops property euler_number, described in the documentation as:
Euler characteristic of the set of non-zero pixels. Computed as number of connected components subtracted by number of holes (input.ndim connectivity). In 3D, number of connected components plus number of holes subtracted by number of tunnels.
So, for scissors, that will be 1-2 = -1, whereas for solid objects it's 1 and for objects with 1 hole it's 1-1 = 0. So you can say:
from skimage import measure
objects = measure.label(borders_cleared)
props_list = measure.regionprops(objects)
num_scissors = 0
for props in props_list: # one RegionProps object per region
if props.euler_number == -1:
num_scissors += 1
When the segmentation itself is easy, as in the image you showed, then my strategy would always be to find a property or combination of properties in regionprops that allows me to distinguish the objects I'm interested from others. This could be size, elongation, roundness, ... Using the extra_properties= keyword argument, you can even compute other properties defined by any function you can imagine.

Converting an AutoCAD model to a matrix of points/volumes with the mass density specified at each location

I am an experimental physicist (grad student) that is trying to take an AutoCAD model of the experiment I've built and find the gravitational potential from the whole instrument over a specified volume. Before I find the potential, I'm trying to make a map of the mass density at each point in the model.
What's important is that I already have a model and in the end I'll have a something that says "At (x,y,z) the value is d". If that's an crazy csv file, a numpy array, an excel sheet, or... whatever, I'll be happy.
Here's what I've come up with so far:
Step 1: I color code the AutoCAD file so that color associates with material.
Step 2: I send the new drawing/model to a slicer (made for 3D printing). This takes my 3D object and turns it into equally spaced (in z-direction) 2d objects... but then that's all output as g-code. But hey! G-code is a way of telling a motor how to move.
Step 3: This is the 'hard part' and the meat of this question. I'm thinking that I take that g-code, which is in essence just a set of instructions on how to move a nozzle and use it to populate a numpy array. Basically I have 3D array, each level corresponds to one position in z, and the grid left is my x-y plane. It reads what color is being put where, and follows the nozzle and puts that mass into those spots. It knows the mass because of the color. It follows the path by parsing the g-code.
When it is done with that level, it moves to the next grid and repeats.
Does this sound insane? Better yet, does it sound plausible? Or maybe someone has a smarter way of thinking about this.
Even if you just read all that, thank you. Seriously.

Does this sound insane? Better yet, does it sound plausible?
It's very reasonable and plausible. Using the g-code could do that, but it would require a g-code interpreter that could map the instructions to a 2D path. (Not 3D, since you mentioned that you're taking fixed z-slices.) That could be problematic, but, if you found one, it could work, but may require some parser manipulation. There are several of these in a variety of languages, that could be useful.
SUGGESTION
From what you describe, it's akin to doing a MRI scan of the object, and trying to determine its constituent mass profile along a given axis. In this case, and unlike MRI, you have multiple colors, so that can be used to your advantage in region selection / identification.
Even if you used a g-code interpreter, it would reproduce an image whose area you'll still have to calculate, so noting that and given that you seek to determine and classify material composition by path (in that the path defines the boundary of a particular material, which has a unique color), there may be a couple ways to approach this without resorting to g-code:
1) If the colors of your material are easily (or reasonably) distinguishable, you can create a color mask which will quantify the occupied area, from which you can then determine the mass.
That is, if you take a photograph of the slice, load the image into a numpy array, and then search for a specific value (say red), you can identify the area of the region. Then, you apply a mask on your array. Once done, you count the occupied elements within your array, and then you divide it by the array size (i.e. rows by columns), which would give you the relative area occupied. Since you know the mass of the material, and there is a constant z-thickness, this will give you the relative mass. An example of color masking using numpy alone is shown here: http://scikit-image.org/docs/dev/user_guide/numpy_images.html
As such, let's define an example that's analogous to your problem - let's say we have a picture of a red cabbage, and we want to know which how much of the picture contains red / purple-like pixels.
To simplify our life, we'll set any pixel above a certain threshold to white (RGB: 255,255,255), and then count how many non-white pixels there are:
from copy import deepcopy
import numpy as np
import matplotlib.pyplot as plt
def plot_image(fname, color=128, replacement=(255, 255, 255), plot=False):
# 128 is a reasonable guess since most of the pixels in the image that have the
# purplish hue, have RGB's above this value.
data = imread(fname)
image_data = deepcopy(data) # copy the original data (for later use if need be)
mask = image_data[:, :, 0] < color # apply the color mask over the image data
image_data[mask] = np.array(replacement) # replace the match
if plot:
plt.imshow(image_data)
plt.show()
return data, image_data
data, image_data = plot_image('cabbage.jpg') # load the image, and apply the mask
# Find the locations of all the pixels that are non-white (i.e. 255)
# This returns 3 arrays of the same size)
indices = np.where(image_data != 255)
# Now, calculate the area: in this case, ~ 62.04 %
effective_area = indices[0].size / float(data.size)
The selected region in question is shown here below:
Note that image_data contains the pixel information that has been masked, and would provide the coordinates (albeit in pixel space) of where each occupied (i.e. non-white) pixel occurs. The issue with this of course is that these are pixel coordinates and not a physical one. But, since you know the physical dimensions, extrapolating those quantities are easily done.
Furthermore, with the effective area known, and knowledge of the physical dimension, you have a good estimate of the real area occupied. To obtain better results, tweak the value of the color threshold (i.e. color). In your real-life example, since you know the color, search within a pixel range around that value (to offset noise and lighting issues).
The above method is a bit crude - but effective - and, it may be worth exploring using it in tandem with edge-detection, as that could help improve the region identification, and area selection. (Note that isn't always strictly true!) Also, color deconvolution may be useful: http://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_ihc_color_separation.html#sphx-glr-auto-examples-color-exposure-plot-ihc-color-separation-py
The downside to this is that the analysis requires a high quality image, good lighting; and, most importantly, it's likely that you'll lose some of the more finer details of the edges, which would impact your masses.
2) Instead of resorting to camera work, and given that you have the AutoCAD model, you can use that and the software itself in addition to the above prescribed method.
Since you've colored each material in the model differently, you can use AutoCAD's slicing tool, and can do something similar to what the first method suggests doing physically: slicing the model, and taking pictures of the slice to expose the surface. Then, using a similar method described above of color masking / edge detection / region determination through color selection, you should obtain a much better and (arguably) very accurate result.
The downside to this, is that you're also limited by the image quality used. But, as it's software, that shouldn't be much of an issue, and you can get extremely high accuracy - close to its actual result.
The last suggestion to improve these results would be to script numerous random thin slicing of the AutoCAD model along a particular directional vector shared by every subsequent slice, exporting each exposed surface, analyzing each image in the manner described above, and then collecting those results to given you a Monte Carlo-like and statistically quantifiable determination of the mass (to correct for geometry effects due to slicing along one given axis).

Clipping image/remove background programmatically in Python

How to go from the image on the left to the image on the right programmatically using Python (and maybe some tools, like OpenCV)?
I made this one by hand using an online tool for clipping. I am completely noob in image processing (especially in practice). I was thinking to apply some edge or contour detection to create a mask, which I will apply later on the original image to paint everything else (except the region of interest) black. But I failed miserably.
The goal is to preprocess a dataset of very similar images, in order to train a CNN binary classifier. I tried to train it by just cropping the image close to the region of interest, but the noise is so high that the CNN learned absolutely nothing.
Can someone help me do this preprocessing?

I used OpenCV's implementation of watershed algorithm to solve your problem. You can find out how to use it if you read this great tutorial, so I will not explain this into a lot of detail.
I selected four points (markers). One is located on the region that you want to extract, one is outside and the other two are within lower/upper part of the interior that does not interest you. I then created an empty integer array (the so-called marker image) and filled it with zeros. Then I assigned unique values to pixels at marker positions.
The image below shows the marker positions and marker values, drawn on the original image:
I could also select more markers within the same area (for example several markers that belong to the area you want to extract) but in that case they should all have the same values (in this case 255).
Then I used watershed. The first input is the image that you provided and the second input is the marker image (zero everywhere except at marker positions). The algorithm stores the result in the marker image; the region that interests you is marked with the value of the region marker (in this case 255):
I set all pixels that did not have the 255 value to zero. I dilated the obtained image three times with 3x3 kernel. Then I used the dilated image as a mask for the original image (i set all pixels outside the mask to zero) and this is the result i got:
You will probably need some kind of method that will find markers automatically. The difficulty of this task depends heavily on the set of the input images. In some cases, the method can be really straightforward and simple (as in the tutorial linked above) but sometimes this can be a tough nut to crack. But I can't recommend anything because I don't know how your images look like in general (you only provided one). :)

Detect the location of an image within a larger image

How do you detect the location of an image within a larger image? I have an unmodified copy of the image. This image is then changed to an arbitrary resolution and placed randomly within a much larger image which is of an arbitrary size. No other transformations are conducted on the resulting image. Python code would be ideal, and it would probably require libgd. If you know of a good approach to this problem you'll get a +1.

There is a quick and dirty solution, and that's simply sliding a window over the target image and computing some measure of similarity at each location, then picking the location with the highest similarity. Then you compare the similarity to a threshold, if the score is above the threshold, you conclude the image is there and that's the location; if the score is below the threshold, then the image isn't there.
As a similarity measure, you can use normalized correlation or sum of squared differences (aka L2 norm). As people mentioned, this will not deal with scale changes. So you also rescale your original image multiple times and repeat the process above with each scaled version. Depending on the size of your input image and the range of possible scales, this may be good enough, and it's easy to implement.
A proper solution is to use affine invariants. Try looking up "wide-baseline stereo matching", people looked at that problem in that context. The methods that are used are generally something like this:
Preprocessing of the original image
Run an "interest point detector". This will find a few points in the image which are easily localizable, e.g. corners. There are many detectors, a detector called "harris-affine" works well and is pretty popular (so implementations probably exist). Another option is to use the Difference-of-Gaussians (DoG) detector, it was developed for SIFT and works well too.
At each interest point, extract a small sub-image (e.g. 30x30 pixels)
For each sub-image, compute a "descriptor", some representation of the image content in that window. Again, many descriptors exist. Things to look at are how well the descriptor describes the image content (you want two descriptors to match only if they are similar) and how invariant it is (you want it to be the same even after scaling). In your case, I'd recommend using SIFT. It is not as invariant as some other descriptors, but can cope with scale well, and in your case scale is the only thing that changes.
At the end of this stage, you will have a set of descriptors.
Testing (with the new test image).
First, you run the same interest point detector as in step 1 and get a set of interest points. You compute the same descriptor for each point, as above. Now you have a set of descriptors for the target image as well.
Next, you look for matches. Ideally, to each descriptor from your original image, there will be some pretty similar descriptor in the target image. (Since the target image is larger, there will also be "leftover" descriptors, i.e. points that don't correspond to anything in the original image.) So if enough of the original descriptors match with enough similarity, then you know the target is there. Moreover, since the descriptors are location-specific, you will also know where in the target image the original image is.

You probably want cross-correlation. (Autocorrelation is correlating a signal with itself; cross correlating is correlating two different signals.)
What correlation does for you, over simply checking for exact matches, is that it will tell you where the best matches are, and how good they are. Flip side is that, for a 2-D picture, it's something like O(N^3), and it's not that simple an algorithm. But it's magic once you get it to work.
EDIT: Aargh, you specified an arbitrary resize. That's going to break any correlation-based algorithm. Sorry, you're outside my experience now and SO won't let me delete this answer.

http://en.wikipedia.org/wiki/Autocorrelation is my first instinct.

Take a look at Scale-Invariant Feature Transforms; there are many different flavors that may be more or less tailored to the type of images you happen to be working with.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.