I recently completed some Python (2.7) code for generating random dot stereograms based on this paper. The output is fairly good, though I have noticed that, even with a smooth gradient in the depth map, the output stereogram lacks these smooth gradients, instead having varying levels of depth. I believe this to be due to the DPI chosen when generating the image. While the detail of the depth can be increased by increasing the DPI, this becomes impractical as the convergence point becomes more difficult to reach.
Here are two examples. First at 75 DPI and second at 175 DPI. On the 75 DPI image, distinct "triangles" of depth can be seen. In the 175 DPI image, these are less pronounced but the guidance dots at the bottom of the image are further apart, and therefore viewing the 3D image is more difficult.
I'm looking to modify my current code to anti-alias the 3D image in order to smooth out the gradients even with a lower DPI. I have tried using SSAA on the depth map and pattern and generating the stereogram, then reducing the image size again with an antialiasing filter. However this seems to just contain the stereogram to the left of the image. For example, if I make the image 4 times bigger, the stereogram is limited to the left hand quarter of the image. The rest is just random noise and cannot be viewed. How would I go about antialiasing the image hidden in the stereogram? My code is almost the same as the algorithm described in the paper, so an antialiasing algorithm based on that would be perfect.
The solution for the problem I was having, with the stereogram being contained to the left of the image, was caused by not extending the same array to reflect the larger depth map. This caused everything beyond the original length of the depth map to be randomly generated noise.
After solving this problem, a second problem arose, in that the 3D image was distorted by the anti-aliasing, causing more gradient issues than it was solving. My solution for this was to increase the DPI setting in the code. For example, if I increased the size of the depth map by 4x, the stereogram must be generated with a DPI 4 times greater (300, rather than 75). When scaled down again, this produced excellent results.
This image uses 2x SSAA, making the gradients comparable with the 175DPI image from the question, but with a much easier converging point.
This image uses 4x SSAA, and I find the jaggies barely visible at all. The noise here becomes a lot more blurred and the general colour of the image becomes quite grey. I have found this effect can be avoided by pregenerating the noise and scaling that up by the same AA factor. This is demonstrated in the next image.
Related
I have the following JPG image. If I want to find the edges where the white page meets the black background. So I can rotate the contents a few degrees clockwise. My aim is to straighten the text for using with Tesseract OCR conversion. I don't see the need to rotate the text blocks as I have seen in similar examples.
In the docs Canny Edge Detection the third arg 200 eg edges = cv.Canny(img,100,200) is maxVal and said to be 'sure to be edges'. Is there anyway to determine these (max/min) values ahead of any trial & error approach?
I have used code examples which utilize the Python cv2 module. But the edge detection is set up for simpler applications.
Is there any approach I can use to take the text out of the equation. For example: only detecting edge lines greater than a specified length?
Any suggestions would be appreciated.
Below is an example of edge detection (above image same min/max values) The outer edge of the page is clearly defined. The image is high contrast b/w. It has even lighting. I can't see a need for the use of an adaptive threshold. Simple global is working. Its just at what ratio to use it.
I don't have the answer to this yet. But to add. I now have the contours of the above doc.
I used find contours tutorial with some customization of the file loading. Note: removing words gives a thinner/cleaner outline.
Consider Otsu.
Its chief virtue is that it is adaptive to local
illumination within the image.
In your case, blank margins might be the saving grace.
Consider working on a series of 2x reduced resolution images,
where new pixel is min() (or even max()!) of original four pixels.
These reduced images might help you to focus on the features
that matter for your use case.
The usual way to deskew scanned text is to binarize and
then keep changing theta until "sum of pixels across raster"
is zero, or small. In particular, with few descenders
and decent inter-line spacing, we will see "lots" of pixels
on each line of text and "near zero" between text lines,
when theta matches the original printing orientation.
Which lets us recover (1.) pixels per line, and (2.) inter-line spacing, assuming we've found a near-optimal theta.
In your particular case, focusing on the ... leader dots
seems a promising approach to finding the globally optimal
deskew correction angle. Discarding large rectangles of
pixels in the left and right regions of the image could
actually reduce noise and enhance the accuracy of
such an approach.
I know the basic flow or process of the Image Registration/Alignment but what happens at the pixel level when 2 images are registered/aligned i.e. similar pixels of moving image which is transformed to the fixed image are kept intact but what happens to the pixels which are not matched, are they averaged or something else?
And how the correct transformation technique is estimated i.e. how will I know that whether to apply translation, scaling, rotation, etc and how much(i.e. what value of degrees for rotation, values for translation, etc.) to apply?
Also, in the initial step how the similar pixel values are identified and matched?
I've implemented the python code given in https://simpleitk.readthedocs.io/en/master/Examples/ImageRegistrationMethod1/Documentation.html
Input images are of prostate MRI scans:
Fixed Image Moving Image Output Image Console output
The difference can be seen in the output image on the top right and top left. But I can't interpret the console output and how the things actually work internally.
It'll be very helpful if I get a deep explanation of this thing. Thank you.
A transformation is applied to all pixels. You might be confusing rigid transformations, which will only translate, rotate and scale your moving image to match the fixed image, with elastic transformations, which will also allow some morphing of the moving image.
Any pixel that a transformation cannot place in the fixed image is interpolated from the pixels that it is able to place, though a registration is not really intelligent.
What it attempts to do is simply reduce a cost function, where a high cost is associated with a large difference and a low cost is associated with a small difference. Cost functions can be intensity based (pixel values) or feature based (shapes). It will (semi-)randomly shift the image around untill a preset criteria is met, generally a maximum amount of iterations.
What that might look like can be seen in the following gif:
http://insightsoftwareconsortium.github.io/SimpleITK-Notebooks/registration_visualization.gif
When humans see markers suggesting the form of a shape, they immediately perceive the shape itself, as in https://en.wikipedia.org/wiki/Illusory_contours. I'm trying to accomplish something similar in OpenCV in order to detect the shape of a hand in a depth image with very heavy noise. In this question, assume that skin color based detection is not working (actually it is the best I've achieved so far but it is not robust under changing light conditions, shadows or skin colors. Also various paper shapes (flat and colorful) are on the table, confusing color-based approaches. This is why I'm attempting to use the depth cam instead).
Here's a sample image of the live footage that is already pre-processed for better contrast and with background gradient removed:
I want to isolate the exact shape of the hand from the rest of the picture. For a human eye this is a trivial thing to do. So here are a few attempts I did:
Here's the result with canny edge detection applied. The problem here is that the black shape inside the hand is larger than the actual hand, causing the detected hand to overshoot in size. Also, the lines are not connected and I fail at detecting contours.
Update: Combining Canny and a morphological closing (4x4 px ellipse) makes contour detection possible with the following result. It is still waaay too noisy.
Update 2: The result can be slightly enhanced by drawing that contour to an empty mask, save that in a buffer and re-detect yet another contour on a merge of three buffered images. The line that combines the buffered images is is hand_img = np.array(np.minimum(255, np.multiply.reduce(self.buf)), np.uint8) which is then morphed once again (closing) and finally contour detected. The results are slightly less horrible than in the picture above but laggy instead.
Alternatively I tried to use an existing CNN (https://github.com/victordibia/handtracking) for detecting the approximate position of the hand's center (this step works) and then flood from there. In order to detect contours the result is put into an OTSU filter and then the largest contour is taken, resulting in the following picture (ignore black rectangles in the left). The problem is that some of the noise is flooded as well and the results are mediocre:
Finally, I tried background removers such as MOG2 or GMG. They are confused by the enormous amount of fast-moving noise. Also they cut off the fingertips (which are crucial for this project). Finally, they don't see enough details in the hand (8 bit plus further color reduction via equalizeHist yield a very poor grayscale resolution) to reliably detect small movements.
It's ridiculous how simple it is for a human to see the exact precise shape of the hand in the first picture and how incredibly hard it is for the computer to draw a shape.
What would be your recommended method to achieve an exact hand segmentation?
After two days of desperate testing, the solution was to VERY carefully apply thresholding to an well-preprocessed image.
Here are the steps:
Remove as much noise as you possibly can. In my case, denoising was done using Intel's pyrealsense2 (I'm using an Intel RealSense depth camera and the algorithms were written for that camera family, thus they work very well). I used rs.temporal_filter() and directly after rs.hole_filling_filter() on every frame.
Capture the very first frame. Besides capturing the exact distance to the table (for later thresholding), this step also saves a still picture that is blurred by a 100x100 px kernel. Since the camera is never mounted perfectly but slightly tilted, there's an ugly grayscale gradient going over the picture and making operations impossible. This still picture is then subtracted from every single later frame, eliminating the gradient. BTW: this gradient removal step is already incorporated in the screenshots shown in the question above
Now the picture is almost noise-free. Do not use equalizeHist. This does not simply increase the general contrast regularly but instead empathizes the remaining noise way too much. This was my main error I did in almost all experiments. Instead, apply a threshold (binary with fixed border) directly. The border is extremely thin, setting it at 104 instead of 205 makes a huge difference.
Invert colors (unless you have taken BINARY_INV in the previous step), apply contours, take the largest one and write it to a mask
VoilĂ !
I am trying to detect people from a camera's feed using cv2.HOGDescriptor() and using their default people classifier.
The recognizer kinda works but I honestly am having an issue with understanding what values to assign to winStride, padding,scale and groupThreshold respectively.
Currently, the camera feed's frame size is 1280 X 720 and I resize it to 400 X 400 then perform detectMultiScale with parameters
hogParams = {'winStride': (8, 8), 'padding': (32, 32), 'scale': 1.05, 'finalThreshold': 2}
Based off of this answer, I understand what these parameters do and represent.
My question is, is there a way of mapping image size with these values? A mathematical equation? An estimation method? I am not necessarily asking for a concrete or even a method that gives all values, but something better than trial and error or magic numbers.
Most of the references and tutorials pretty much use magic numbers without giving a proposition of how they attained them.
PS: Here's a visual aid in case you're still not sure of my question
There is no silver bullet here. It is unfortunately very handwavy as the optimal solution will vary from input data to input data.
Here is a little extra guidance:
If stride > window size your detector might not even be run on the person. I always think of stride in relation to the window size e.g. 64/8.
If scale~1 not much will happen. Values like 1.2, 1.3 are usually better. This parameter essentially scales the image down, and then runs the detector again. The hope is, that if people were too big for the detector in the first run, they might be the right size after scaling down. E.g. if your detector size is the default 64x128, but some person in the image is 150px high, the detector might not realize it's a person as it can only view the legs, or the torso at once. If we scale down 150 / 1.2 = 125 the person might now actually be detected. (silly numbers. It is very plausible that it could detect the person if they were 150px. But you get the idea.)
The best way to go about it is to experiment a little. Choose some images/video that you think are representative of your use case, create an end-to-end setup, and play around with a couple of different parameter settings. If persons are not detected think about their sizes in relation to your detector size. Are they bigger than that? Smaller? If they are smaller perhaps increase the scale-factor, or increase the number of levels. If they are bigger, downscale the input image more.
.. 1280 X 720 and I resize it to 400 X 400...
Side note: If you are simply resizing without cropping you will get bad results. Either resize to the same aspect ratio such as 711x400, or crop the initial image to a square before resizing.
I have an image feature extraction problem. The input images are binary (black and white) and may contain blobs of approximately known area and aspect ratio. These need to be fit with ellipses using some best fit algorithm.
Example input:
Desired output:
There may be multiple blobs (zero or more), the number is not known in advance. The approximate area and aspect ratio of all the blobs is known (and is the same). How many are in the image, their position, orientation and actual size are what I'm trying to find. The output should be a best fit ellipse for each blob based on the actual found size and aspect ratio.
What makes this hard is noise and possible overlaps.
Example with noise:
Example with overlap and noise:
The noisy image may have holes in the blobs and also small other blobs scattered around. The small other blobs are not counted because they are too small and do not cover any area densely enough to be considered a real match.
The image with overlap should be counted as two blobs because the area is too big for a single blob to cover it well.
A possible metric that evaluates a potential fit is:
sum over all ellipses of (K1 * percent deviation from expected size + K2 * percent deviation from expected aspect ratio + K3 * percent of ellipse which is not black + K4 * percent overlapped with any other ellipse) + K5 * percent of rest of image which is black
for some suitably chosen parameters K1..K5. A perfect match scores 0.
I can see how to solve this using brute force, for example trying enough different possible fits to sample the search space well. I can't think off-hand of a method much faster than brute force.
I would prefer examples in python and/or opencv. I will try to implement and post any suggested solutions in python. Thanks!
P.S. It cannot be assumed that a blob is connected. There may be enough noise to break it up into discontinuous parts.
P.P.S. The little bits of noise cannot be removed by binary erosion. In some of my images, there are enough interior holes that erosion makes the whole (real) blob disappear if the image is eroded enough to make the noise bits disappear as well.
P.P.P.S. I think that it would be very hard to solve this using any approach based on contours. The data I see in practice has too much edge noise, there can be (and often are) bits of noise that connect separate blobs, or that separate a single blob into several (apparent) connected components. I would like an approach based on areas, since area coverage seems to be much less nosy than the edge shapes.
P.P.P.P.S. As requested, here is an example with a through cut due to noise:
and a sample with lots and lots of noise but nevertheless a distinct blob:
EDIT None of the answers actually solves the problem, although Bharat has suggested a partial solution which does well for non-overlapping blobs. More please :) I will award additional bounty to any actual solutions.
I would try to fit a gaussian mixture model and then use the mean and the covariance matrix to fit ellipses over the data. Such a model would work even with overlap and small noisy blobs. The data which you have would be the the coordinates of the pixels which are black and you could fit a GMM over the data. One issue with this approach is that you need to know the number of blobs you need to track in advance, if you can come up with a heuristic for that, GMM should solve this problem pretty efficiently.
You fill the holes [1], you detect contours [2], and use moments on each contour rectangle to find orientation, eccentricity, etc[3].
PS.: The disconnected contours (noise) can be filtered out by size.
You might start by filtering out contours by area.
About separating overlapping blobs, that might be a tricky one (I would risk to say, impossible for arbitrary overlaps) to do with this binary image, maybe you should do it with the original image or at least some steps back of pre-processing.
OpenCV's fittEllipse will also be helpful.
this is not some basic programming problem, this involves advanced image processing techniques. from what I know, "Image Processing, morphology" are the points you are targeting for. you can take some course in "Image Morphology" understand those basic constructs such as "Dilation, Erosion" etc... then you have the fundamentals solving this issue at hand.
Since you have the size and orientation, you can draw each ellipse, and use template matching.
see tutorial here.