OpenCV HOG Descriptor Parameters - python

I am trying to detect people from a camera's feed using cv2.HOGDescriptor() and using their default people classifier.
The recognizer kinda works but I honestly am having an issue with understanding what values to assign to winStride, padding,scale and groupThreshold respectively.
Currently, the camera feed's frame size is 1280 X 720 and I resize it to 400 X 400 then perform detectMultiScale with parameters
hogParams = {'winStride': (8, 8), 'padding': (32, 32), 'scale': 1.05, 'finalThreshold': 2}
Based off of this answer, I understand what these parameters do and represent.
My question is, is there a way of mapping image size with these values? A mathematical equation? An estimation method? I am not necessarily asking for a concrete or even a method that gives all values, but something better than trial and error or magic numbers.
Most of the references and tutorials pretty much use magic numbers without giving a proposition of how they attained them.
PS: Here's a visual aid in case you're still not sure of my question

There is no silver bullet here. It is unfortunately very handwavy as the optimal solution will vary from input data to input data.
Here is a little extra guidance:
If stride > window size your detector might not even be run on the person. I always think of stride in relation to the window size e.g. 64/8.
If scale~1 not much will happen. Values like 1.2, 1.3 are usually better. This parameter essentially scales the image down, and then runs the detector again. The hope is, that if people were too big for the detector in the first run, they might be the right size after scaling down. E.g. if your detector size is the default 64x128, but some person in the image is 150px high, the detector might not realize it's a person as it can only view the legs, or the torso at once. If we scale down 150 / 1.2 = 125 the person might now actually be detected. (silly numbers. It is very plausible that it could detect the person if they were 150px. But you get the idea.)
The best way to go about it is to experiment a little. Choose some images/video that you think are representative of your use case, create an end-to-end setup, and play around with a couple of different parameter settings. If persons are not detected think about their sizes in relation to your detector size. Are they bigger than that? Smaller? If they are smaller perhaps increase the scale-factor, or increase the number of levels. If they are bigger, downscale the input image more.
.. 1280 X 720 and I resize it to 400 X 400...
Side note: If you are simply resizing without cropping you will get bad results. Either resize to the same aspect ratio such as 711x400, or crop the initial image to a square before resizing.

Related

Is it possible to calculate real-time distance of an object in a image w/o reference objects?

I have a picture of human eye taken roughly 10cm away using a mobile phone(no specifications regarding the camera). After some detection and contouring, I got 113px as the Euclidean distance between the center of the detected iris and the outermost edge of iris on the taken image. Dimensions of the image: 483x578px.
I tried converting the pixels into mm by simply multiplying the number of pixels with the size of a pixel in mm since 1px is roughly equal to 0.264mm which gives the proper length only if the image is in 1:1 ratio wrt to the real-time eye which is not the case here.
Edit:
Device used: One Plus 7T
View of range = 117 degrees
Aperture = f/2.2
Distance photo was taken = 10 cm (approx)
Question:
Is there an optimal way to find the real time radius of this particular eye with the amount of information I have gathered through processing thus far and by not including a reference object within the image?
P.S. The actual HVID of the volunteer's iris is 12.40mm taken using Sirus(A hi-end device to calculate iris radius and I'm trying to simulate the same actions using Python and OpenCV)
After months I was able to come up with the result after ton of research and lots of trials and errors. This is not the most ideal answer but it gave me expected results with decent precision.
Simply, In order to measure object size/distance from the image we need multiple parameters. In my case, I was trying to measure the diameter of iris from a smart phone camera.
To make that possible we need to know the following details prior to the calculation
1. The Size of the physical sensor (height and width) (usually in mm)
(camera inside the smart phone whose details can be obtained from websites on the internet but you need to know the exact brand and version of the smart phone used)
Note: You cannot use random values for these, otherwise you will get inaccurate results. Every step/constraint must be considered carefully.
2. The Size of the image taken (pixels).
Note: Size of the image can be easily obtained used img.shape but make sure the image is not cropped. This method relies on the total width/height of the original smartphone image so any modifications/inconsistencies would result in inaccurate results.
3. Focal Length of the Physical Sensor (mm)
Note: Info regarding focal length of the sensor used can be acquired from the internet and random values should not be given. Make sure you are taking images with auto focus feature disabled so the focal length is preserved. Incase if you have auto focus on then the focal length will be constantly changing and the results will be all over the place.
4. Distance at which the image is taken (Very Important)
Note: As "Christoph Rackwitz" told in the comment section. The distance from which the image is taken must be known and should not be arbitrary. Head cannoning a number as input will always result in inaccuracy for sure. Make sure you properly measure the distance from sensor to the object using some sort of measuring tool. There are some depth detection algorithms out there in the internet but they are not accurate in most cases and need to calibrated after every single try. That is indeed an option if you dont have any setup to take consistent photos but inaccuracies are inevitable especially in objects like iris which requires medical precision.
Once you have gathered all these "proper" information the rest is to dump these into a very simple equation which is a derivative of the "Similar Traingles".
Object height/width on sensor (mm) = Sensor height/width (mm) × Object height/width (pixels) / Sensor height/width (pixels)
Real Object height (in units) = Distance to Object (in units) × Object height on sensor (mm) / Focal Length (mm)
In the first equation, you must decide from which axis you want to measure. For instance, if the image is taken in portrait and you are measuring the width of the object on the image, then input the width of the image in pixels and width of the sensor in mm
Sensor height/width in pixels is nothing but the size of the "image"
Also you must acquire the object size in pixels by any means.
If you are taking image in landscape, make sure you are passing the correct width and height.
Equation 2 is pretty simple as well.
Things to consider:
No magnification (Digital magnification can destroy any depth info)
No Autofocus (Already Explained)
No cropping/editing image size/resizing (Already Explained)
No image skewing.(Rotating the image can make the image unfit)
Do not substitute random values for any of these inputs (Golden Advice)
Do not tilt the camera while taking images (Tilting the camera can distort the image so the object height/width will be altered)
Make sure the object and the camera is exactly in the same line
Don't use EXIF data of the image (EXIF data contains depth information which is absolute garbage since they are not accurate at all. DO NOT CONSIDER THEM)
Things I'm unsure of till now:
Lens distortion / Manufacturing defects
Effects of field of view
Perspective Foreshortening due to camera tilt
Depth field cameras
DISCLAIMER: There are multiple ways to solve this issue but I chose to use this method and I highly recommend you guys to explore more and see what you can come up with. You can basically extend this idea to measure pretty much any object using a smartphone (given the images that a normal smart phone can take)
(Please don't try to measure the size of an amoeba with this. Simply won't work but you can indeed take some of the advice I have gave for your advantage)
If you have cool ideas and issues with my answers. Please feel free to let me know I would love to have discussions. Feel free to correct me if I have made any mistakes and misunderstood any of these concepts.
Final Note:
No matter how hard you try, you cannot make something like a smartphone to work and behave like a camera sensor which is specifically designed to take images for measuring purposes. Smart phone can never beat those but sure we can manipulate the smart phone camera to achieve similar results upto a certain degree. So you guys must keep this in mind and I learnt it the hard way

Is it possible to turn a low quality image into a high quality one with Python?

I made a tif image based on a 3d model of a woodsheet. (x, y, z) represents a point in a 3d space. I simply map (x, y) to a pixel position in the image and (z) to the greyscale value of that pixel. It worked as I have imagined. Then I ran into a low-resolution problem when I tried to print it. The tif image would get pixilated badly as soon as it zooms out. My research suggests that I need to increase the resolution of the image. So I tried a few super-resolution algos found from online sources, including this one https://learnopencv.com/super-resolution-in-opencv/
The final image did get a lot bigger in resolution (10+ times larger in either dimension) but the same problem persists - it gets pixilated as soon as it zooms out, just about the same as the original image.
Looks like quality of an image has something to do not only with resolution of it but also something else. When I say quality of image, I mean how clear the wood texture is in the image. And when I enlarge it, how sharp/clear the texture remains in the image. Can anyone shed some light on this? Thank you.
original tif
The algo generated tif is too large to be included here (32M)
Gigapixel enhanced tif
Update - Here is a recently achieved result: with a GAN-based solution
It has restored/invented some of the wood grain details. But the models need to be retrained.
In short, it is possible to do this via deep learning reconstruction like the Super Resolution package you referred to, but you should understand what something like this is trying to do and whether it is fit for purpose.
Generic algorithms like the Super Resolution is trained on variety of images to "guess" at details that is not present in the original image, typically using generative training methods like using the low vs high resolution version of the same image as training data.
Using a contrived example, let's say you are trying to up-res a picture of someone's face (CSI Zoom-and-Enhance style!). From the algorithm's perspective, if a black circle is always present inside a white blob of a certain shape (i.e. a pupil in an eye), then next time it the algorithm sees the same shape it will guess that there should be a black circle and fill in a black pupil. However, this does not mean that there is details in the original photo that suggests a black pupil.
In your case, you are trying to do a very specific type of up-resing, and algorithms trained on generic data will probably not be good for this type of work. It will be trying to "guess" what detail should be entered, but based on a very generic and diverse set of source data.
If this is a long-term project, you should look to train your algorithm on your specific use-case, which will definitely yield much better results. Otherwise, simple algorithms like smoothing will help make your image less "blocky", but it will not be able to "guess" details that aren't present.

Measure the rate of growth of a crack from Video

My experiment involves subjecting a substance to pressure that makes the substance eventually crack. The crack grows with time and pressure applied. I have a set-up to take a picture of the substance at fixed intervals of time.
I need to measure how fast crack grows.How do I go about this? (I can code in Python).
Is there a way to measure live speed or speed of growth of crack from one frame to another?
Google drive link to series of pictures taken - https://drive.google.com/open?id=189cv8B4rm3lhSgT6OYfI_aN0Xmqi-tYi
Kindly advise.
I Tried floodFill from OpenCV as per suggestions to this question. But the returned mask is as shown:
h, w = resized.shape[:2]
mask = np.zeros((h+2, w+2), np.uint8)
seed = (int(w/2),int(h/2))
# Floodfill from point (0, 0)
num,im,mask,rect = cv2.floodFill(resized, mask, (0,0), (255,0,0), (10,)*3, (10,)*3, floodflags)
I thought if I can get the co-ordinates of the rectangle bounding box that encloses the crack, I can track its co-ordinates across frames and measure the size of the crack and eventually the speed.
I tried thresholding as below:
th, im_th = cv2.threshold(im, 100, 255, cv2.THRESH_BINARY);
This gives:
I'm unsure if this will let me filter out the background and draw a bounding box over the crack alone. Please advise.
Thanks in advance.
Depending on how slowly the crack forms, you probably don't need a video; you'll likely wind up sampling every X frames anyway, and throwing all of the extra frames away. What you want is enough frames to get "incremental" changes in the crack without getting too many frames that it becomes too computationally expensive.
If you can carefully control the lighting conditions in your setup, then you're in luck! This becomes a very simple problem. You can take a histogram of the pixels (openCV has handles for this, but so does PIL and numpy); you should get two families of color; one that is the color of the outside of the substance, and another that is biased by the shadow in the crack.
You can also try dramatically increasing the contrast in each image/frame in order to get a binary mask of the crack, or running an edge detector over the image. These techniques will lead to frames that are substantially easier to process than the raw footage. You can even feed these into a skeletonization process in order to generate a vector-based representation of the line, in XY image coordinates.
If you can't control the lighting, or the sample is a similar color to the crack, you'll probably need to use object detection techniques, but it's unlikely there's an existing "crack detector," so you may either need to build your own, or look for what other detectors serve as a good proxy for the color and shape of the forming crack.
I'd highly recommend trying the first option if at all possible; pixel and histogram math is far easier than other techniques.
I appreciate you are only just getting started but you have some issues with your video. Firstly the lighting it is not best and it is not consistent because people are moving around in front of it and casting shadows - it also doesn't illuminate the the background behind the crack best - it would be better if it was at the height of the crack and shining more into it so that it better illuminates the background behind the crack. Secondly, you could do without the camera moving part way through the experiment!
Finally, if you want to measure things you need to calibrate, which at the very least means putting a ruler in the image - or scale lines on your background at fixed intervals. If you are doing all that you may as well make life easy for yourself and put markers of a specific colour/pattern, both different, on the top and bottom of the frame plates that are applying the load.
Finally then, you want to do something like a floodfill, or a fill just within the confines of your material (probably by masking) to fill the crack with a different colour. It is then pretty simple to measure the length of the crack and the left-most extent of the crack.
With a proper segmentation approach you are going to have a detailed geometry of the object extracted from a single frame. For example:
If you process multiple frames you will be able to see geometry evolution in time. Having that it should be easy to compare polygons to find form changes, cracks, etc:
I used to work with 4K video to get all required details and good accuracy. You might not need all that data, but video is still way more flexible.
Here is a complete example: https://youtu.be/g2KyfrBtTA4
Provide some examples if you want to get more detailed recommendations.
Update
Real examples are always helpful. So you can segment a crack:
or a substance:
or both:
Basically, you need to enhance overall quality of the input (focus, background under the substance, etc).
As Mark Setchell showed, you might get unwanted background as part of the result shape (the right side of the crack), so it is better to make sure that will not happen or just try to analyze only the substance.
Anyway, your task doesn't seem to be complex. It might be trivial if you can improve image quality and do some simplifications to the environment (some specific background, etc).

Anti-aliasing of random dot stereograms

I recently completed some Python (2.7) code for generating random dot stereograms based on this paper. The output is fairly good, though I have noticed that, even with a smooth gradient in the depth map, the output stereogram lacks these smooth gradients, instead having varying levels of depth. I believe this to be due to the DPI chosen when generating the image. While the detail of the depth can be increased by increasing the DPI, this becomes impractical as the convergence point becomes more difficult to reach.
Here are two examples. First at 75 DPI and second at 175 DPI. On the 75 DPI image, distinct "triangles" of depth can be seen. In the 175 DPI image, these are less pronounced but the guidance dots at the bottom of the image are further apart, and therefore viewing the 3D image is more difficult.
I'm looking to modify my current code to anti-alias the 3D image in order to smooth out the gradients even with a lower DPI. I have tried using SSAA on the depth map and pattern and generating the stereogram, then reducing the image size again with an antialiasing filter. However this seems to just contain the stereogram to the left of the image. For example, if I make the image 4 times bigger, the stereogram is limited to the left hand quarter of the image. The rest is just random noise and cannot be viewed. How would I go about antialiasing the image hidden in the stereogram? My code is almost the same as the algorithm described in the paper, so an antialiasing algorithm based on that would be perfect.
The solution for the problem I was having, with the stereogram being contained to the left of the image, was caused by not extending the same array to reflect the larger depth map. This caused everything beyond the original length of the depth map to be randomly generated noise.
After solving this problem, a second problem arose, in that the 3D image was distorted by the anti-aliasing, causing more gradient issues than it was solving. My solution for this was to increase the DPI setting in the code. For example, if I increased the size of the depth map by 4x, the stereogram must be generated with a DPI 4 times greater (300, rather than 75). When scaled down again, this produced excellent results.
This image uses 2x SSAA, making the gradients comparable with the 175DPI image from the question, but with a much easier converging point.
This image uses 4x SSAA, and I find the jaggies barely visible at all. The noise here becomes a lot more blurred and the general colour of the image becomes quite grey. I have found this effect can be avoided by pregenerating the noise and scaling that up by the same AA factor. This is demonstrated in the next image.

Parameters of detectMultiScale in OpenCV using Python

I am not able to understand the parameters passed to detectMultiScale. I know that the general syntax is detectMultiScale(image, rejectLevels, levelWeights)
However, what do the parameters rejectLevels and levelWeights mean? And what are the optimal values used for detecting objects?
I want to use this to detect pupil of the eye
Amongst these parameters, you need to pay more attention to four of them:
scaleFactor – Parameter specifying how much the image size is reduced at each image scale.
Basically, the scale factor is used to create your scale pyramid. More explanation, your model has a fixed size defined during training, which is visible in the XML. This means that this size of the face is detected in the image if present. However, by rescaling the input image, you can resize a larger face to a smaller one, making it detectable by the algorithm.
1.05 is a good possible value for this, which means you use a small step for resizing, i.e. reduce the size by 5%, you increase the chance of a matching size with the model for detection is found. This also means that the algorithm works slower since it is more thorough. You may increase it to as much as 1.4 for faster detection, with the risk of missing some faces altogether.
minNeighbors – Parameter specifying how many neighbors each candidate rectangle should have to retain it.
This parameter will affect the quality of the detected faces. Higher value results in fewer detections but with higher quality. 3~6 is a good value for it.
minSize – Minimum possible object size. Objects smaller than that are ignored.
This parameter determines how small size you want to detect. You decide it! Usually, [30, 30] is a good start for face detection.
maxSize – Maximum possible object size. Objects bigger than this are ignored.
This parameter determines how big size you want to detect. Again, you decide it! Usually, you don't need to set it manually, the default value assumes you want to detect without an upper limit on the size of the face.
A code example can be found here:
http://docs.opencv.org/3.1.0/d7/d8b/tutorial_py_face_detection.html#gsc.tab=0
Regarding the parameter descriptions, you may have quoted old parameter definitions, in fact you may be faced with the following parameters:
scaleFactor: Parameter specifying how much the image size is reduced
at each image scale.
minNeighbors: Parameter specifying how many neighbors each candidate rectangle should have to retain it
Here you can find a nice explanation on these parameters:
http://www.bogotobogo.com/python/OpenCV_Python/python_opencv3_Image_Object_Detection_Face_Detection_Haar_Cascade_Classifiers.php
Make sure to obtain proper pretrained classifier sets for faces and eyes such as
haarcascade_frontalface_default.xml
haarcascade_eye.xml
The OpenCV Class List docs provides the descriptions for all C++ and Python method.
Here is the one for cv::CascadeClassifier detectMultiScale:
detectMultiScale
Python:
objects = cv.CascadeClassifier.detectMultiScale(image[, scaleFactor[, minNeighbors[, flags[, minSize[, maxSize]]]]]
Parameters:
image Matrix of the type CV_8U containing an image where objects
are detected.
objects Vector of rectangles where each rectangle contains the
detected object, the rectangles may be partially outside
the original image.
scaleFactor Parameter specifying how much the image size is reduced
at each image scale.
minNeighbors Parameter specifying how many neighbors each candidate
rectangle should have to retain it.
flags Parameter with the same meaning for an old cascade as in
the function cvHaarDetectObjects. It is not used for a
new cascade.
minSize Minimum possible object size. Objects smaller than that
are ignored.
maxSize Maximum possible object size. Objects larger than that
are ignored. If maxSize == minSize model is evaluated
on single scale.
Note
(Python) A face detection example using cascade classifiers can be found at opencv_source_code/samples/python/facedetect.py
As noted, a sample usage is available from the OpenCV source code. You can pass in each documented parameter as a keyword.
rects = cascade.detectMultiScale(img,
scaleFactor=1.3,
minNeighbors=4,
minSize=(30, 30),
flags=cv.CASCADE_SCALE_IMAGE)
detectMultiScale function is used to detect the faces. This function will return a rectangle with coordinates(x,y,w,h) around the detected face.
It takes 3 common arguments — the input image, scaleFactor, and minNeighbours.
scaleFactor specifies how much the image size is reduced with each scale. In a group photo, there may be some faces which are near the camera than others. Naturally, such faces would appear more prominent than the ones behind. This factor compensates for that.
minNeighbours specifies how many neighbours each candidate rectangle should have to retain it. You can read about it in detail here. You may have to tweak these values to get the best results. This parameter specifies the number of neighbours a rectangle should have to be called a face.
We obtain these values after trail and test over a specific range.

Categories