What does "pre-image" mean in this context? - python

I am comparing the performance of feature detection algorithms like Harris, ORB, MSER (OpenCV Python). I have two images of the same object taken from different viewpoints. Since I am just a beginner to this area, I am having trouble understanding what "pre-image" means in this context. How do I get the "pre-image"?
Detecting regions covariant with a class of transformations has now reached some maturity in the computer vision literature. The requirement for these regions is that they should correspond to the same pre-image for different viewpoints, i.e., their shape is not fixed but automatically adapts, based on the underlying image intensities, so that they are the projection of the same 3D surface patch.

This seems to be a quote from "A comparison of affine region detectors" paper.
You can interpret the term "pre-image" to mean the "3D surface patch" (that corresponds to a region in 2D). In other words, it's a part of the "scene"/"3D environment" that's being photographed.
The confusion arises because authors are using the term to refer to the mathematical object, rather than anything to do with photographs/images.
To elaborate more -- consider the process of taking a photograph of a 3D scene as a mathematical function. This function has several inputs: the 3D scene itself, the viewpoint, illumination, and so on. The output is a 2D array of pixel intensities. Given a 2D array of pixel intensities, and focusing on a particular 2D region, the corresponding 3D surface patch forms the "pre-image" of the function.

Related

How to extract contour of the front panel the washing machine?

I am looking a robust way to extract the contour of the front panel of a washing machine. Or just get 4 corner points of the front panel.
I've tried color masking but didn't find stable results.
Here some examples:
Three potential options:
Get a bunch of images of the machines, manually determine a label saying where the door is, and then train a convolutional neural network to regress those parameters per image.
Treat each image as a separate optimization problem, where the goal is to estimate the parameters of the best rectangle most likely to correspond to the front panel. So our model is theta = (p_1, p_2, p_3, p_4), the four 2D locations of the panel in the image. We need an energy function E to minimize wrt theta (e.g., using gradient descent with momentum, or RANSAC). There are a number of terms you can use, just as some ideas:
a. At least some of the corners should be "corner-like": run a simple corner detector, and define an energy E_corner which penalizes distance to the closest corner.
b. At least some of the edges (between p_1 and p_2 or p_3, for example) should be "edge-like": compute the gradient magnitude of the image M = || \nabla I || and enforce that along the panel edge the values of M should be larger, using an energy E_edge. E.g., for x,y along an edge, let E_edge(x,y)=1/(1+M(x,y)) (Robust losses tend to be better here though).
c. Use the fact that each door is actually a projected 3D rectangle: e.g., see this question. An interesting idea is to start with a rectangle (representing the panel) and instead of regressing the p_i's, instead regress the parameters of an affine transform or even perspective projection transform (though this requires the algorithm estimate depth), that maps the starting rectangle to one in the image. You can then regularize the parameters of the estimated transform to prevent unlikely transforms from being output.
d. Use knowledge of what must be inside the rectangle. For instance, given the four corners, you can determine the ellipse defining the round door to the machine. The appearance statistics within that ellipse should be somewhat unique, as well as the edges/image gradient at the door boundary; hence you can define an energy term encouraging the model to choose corners such that the interior has a dark elliptical object on a white background.
Overall, this approach is similar to snakes, or active contour models, which might be worth looking into for you I think. However, energy-minimizing snakes tend not to consider the inside of the region they enclose; hence, some variant of the Mumford-Shah functional could be a useful addition (though note smoothness of the "door region" is not entirely desirable in your case).
If all your machines are very similar or nearly the same (as the ones you've posted are), it might actually be best to estimate a homography between the images. (See also here or here). Since the front of the machine is nearly planar, the fronts of different images must be related by a homography. Then knowing where the front panel is in one image will tell you where it is in all of them. For instance, check out the OpenCV tutorial for homographies, where they show how to undo the perspective transform of a planar surface allowing you to do a perspective warp of one image to another (here, one projected machine panel to another template one).

Conversion from pixel to general Metric(mm, in)

I am using openCV to process an image and use houghcircles to detect the circles in the image under test, and also calculating the distance between their centers using euclidean distance.
Since this would be in pixels, I need the absolute distances in mm or inches, can anyone let me know how this can be done
Thanks in advance.
The image formation process implies taking a 2D projection of the real, 3D world, through a lens. In this process, a lot of information is lost (e.g. the third dimension), and the transformation is dependent on lens properties (e.g. focal distance).
The transformation between the distance in pixels and the physical distance depends on the depth (distance between the camera and the object) and the lens. The complex, but more general way, is to estimate the depth (there are specialized algorithms which can do this under certain conditions, but require multiple cameras/perspectives) or use a depth camera which can measure the depth. Once the depth is known, after taking into account the effects of the lens projection, an estimation can be made.
You do not give much information about your setup, but the transformation can be measured experimentally. You simply take a picture of an object of known dimensions and you determine the physical dimension of one pixel (e.g. if the object is 10x10 cm and in the picture it has 100x100px, then 10px is 1mm). This is strongly dependent on the distance to the camera from the object.
An approach a bit more automated is to use a certain pattern (e.g. checkerboard) of known dimensions. It can be automatically detected in the image and the same transformation can be performed.

Determine the distance between a camera and person/face in python-OpenCV

There are many tutorials on how to calculate the distance between a camera and an object. Is it possible to calculate the approximate distance between a person detected and the camera using OpenCV?.
Yes it is possible. Like mentioned by #hkchengrex consider your face an object. There's plenty of methods. I'd recommend SIFT Feature Matching of the methods described following that link.
Here are roughly the required steps:
Take a picture of the person and measure the distance manually.
Crop this picture to only contain the person.
Extract the image features (e.g. as sift descriptor)
Take a second picture with the same person but unknown distance.
Detect the person via sift matching (see link above)
Compute a transformation between those two sift feature vectors
Apply the transformation to the distance measured in 1.
Best start at the link provided and further SIFT tutorials in opencv. The required approach is a very simple one and will only work if the person in the picture that is being examined is very similar to the person of picture one. For more advanced approaches I'd refer to scientific papers. Search for "person detection".
In reply to the comments
TL;DR person with same height/width in reality but displayed smaller/larger in the image can be measured regarding distance.
The depicted approach works under the hood as follows. The person (=cropped image) captured at step 2 can be found in any future image as long as he/she appears very similar. In the new image it will give you the rectangular region where the person is located. As the dimensions of this rectangle are now smaller/larger you can take those changes to compute the transformation (which is basically intercept theorem) and thereby the new distance.
What does this mean for a general approach measuring ANY person?
In case the person has the same width/height as the person from step 2 this process works flawlessly. In case they are of similar but but not identical height/width there will be calculation errors. But the results MAY still suffice for your use case. (You can define a generic human e.g. 1,8m of height and XX of width). Nevertheless SIFT might be a bit too specific here. Sorry I'd just refer you to google to see what works best.
If your camera is fixated and the recorded scene doesn't change too much I'd just define a ground plane and manually annotate every pixel projected on this plane with a depth value. So you only have to detect the arbitrary person, see where their feet touch the ground plane and look up this pixel's defined depth value.
If the use case has higher demands you'd have to measure depth in a more complex fashion. This can be done using a stereo-camera rig, a depth sensor or an image sequence via structure from motion.
So there is not the "one can do all" method in OpenCV. It always depends on the use case, the environment and a combination of quite elaborate methods.

How to flatten 3D object surface into 2D array?

I've got 3D objects which are represented as numpy arrays.
How can I unfold the "surface" of such objects to get a 2D map of values (I don't care about inner values)?
It's similar to unwrapping globe surface, but the shape is varied from case to case.
This is a vertices problem. Each triangle on the model is a flat surface that can be mapped to a 2D plane. So the most naive solution without any assumed structure would be to:
for triangle in mesh:
// project to plane defined by normal to avoid stretching
This solution is not ideal as it places all of the uv's on top of each other. The next step would be to spread out the triangles to fill a certain space. This is the layout stage that defines how the vertices are layed out in the 2D space.
Usually it is ideal to fit the UV's within a unit square. This allows for easy UV mapping from a single image.
Option: 2
You surround the object with a known 2D mapped shape and project each triangle onto the shape based on its normal. This provides a mechanism for unwrapping the uv's in a structured manor. An example object would be to project onto a cube.
Option: 3
consult academic papers and opensource libraries/tools like blender:
https://wiki.blender.org/index.php/Doc:2.4/Manual/Textures/Mapping/UV/Unwrapping
blender uses methods as described above to unwrap arbitrary geometry. There are other methods to accomplish this as described on the blender unwrap page. The nice thing about blender is that you can consult the source code for the implementation of the uv unwrap methods.
Hope this is helpful.

Converting an AutoCAD model to a matrix of points/volumes with the mass density specified at each location

I am an experimental physicist (grad student) that is trying to take an AutoCAD model of the experiment I've built and find the gravitational potential from the whole instrument over a specified volume. Before I find the potential, I'm trying to make a map of the mass density at each point in the model.
What's important is that I already have a model and in the end I'll have a something that says "At (x,y,z) the value is d". If that's an crazy csv file, a numpy array, an excel sheet, or... whatever, I'll be happy.
Here's what I've come up with so far:
Step 1: I color code the AutoCAD file so that color associates with material.
Step 2: I send the new drawing/model to a slicer (made for 3D printing). This takes my 3D object and turns it into equally spaced (in z-direction) 2d objects... but then that's all output as g-code. But hey! G-code is a way of telling a motor how to move.
Step 3: This is the 'hard part' and the meat of this question. I'm thinking that I take that g-code, which is in essence just a set of instructions on how to move a nozzle and use it to populate a numpy array. Basically I have 3D array, each level corresponds to one position in z, and the grid left is my x-y plane. It reads what color is being put where, and follows the nozzle and puts that mass into those spots. It knows the mass because of the color. It follows the path by parsing the g-code.
When it is done with that level, it moves to the next grid and repeats.
Does this sound insane? Better yet, does it sound plausible? Or maybe someone has a smarter way of thinking about this.
Even if you just read all that, thank you. Seriously.
Does this sound insane? Better yet, does it sound plausible?
It's very reasonable and plausible. Using the g-code could do that, but it would require a g-code interpreter that could map the instructions to a 2D path. (Not 3D, since you mentioned that you're taking fixed z-slices.) That could be problematic, but, if you found one, it could work, but may require some parser manipulation. There are several of these in a variety of languages, that could be useful.
SUGGESTION
From what you describe, it's akin to doing a MRI scan of the object, and trying to determine its constituent mass profile along a given axis. In this case, and unlike MRI, you have multiple colors, so that can be used to your advantage in region selection / identification.
Even if you used a g-code interpreter, it would reproduce an image whose area you'll still have to calculate, so noting that and given that you seek to determine and classify material composition by path (in that the path defines the boundary of a particular material, which has a unique color), there may be a couple ways to approach this without resorting to g-code:
1) If the colors of your material are easily (or reasonably) distinguishable, you can create a color mask which will quantify the occupied area, from which you can then determine the mass.
That is, if you take a photograph of the slice, load the image into a numpy array, and then search for a specific value (say red), you can identify the area of the region. Then, you apply a mask on your array. Once done, you count the occupied elements within your array, and then you divide it by the array size (i.e. rows by columns), which would give you the relative area occupied. Since you know the mass of the material, and there is a constant z-thickness, this will give you the relative mass. An example of color masking using numpy alone is shown here: http://scikit-image.org/docs/dev/user_guide/numpy_images.html
As such, let's define an example that's analogous to your problem - let's say we have a picture of a red cabbage, and we want to know which how much of the picture contains red / purple-like pixels.
To simplify our life, we'll set any pixel above a certain threshold to white (RGB: 255,255,255), and then count how many non-white pixels there are:
from copy import deepcopy
import numpy as np
import matplotlib.pyplot as plt
def plot_image(fname, color=128, replacement=(255, 255, 255), plot=False):
# 128 is a reasonable guess since most of the pixels in the image that have the
# purplish hue, have RGB's above this value.
data = imread(fname)
image_data = deepcopy(data) # copy the original data (for later use if need be)
mask = image_data[:, :, 0] < color # apply the color mask over the image data
image_data[mask] = np.array(replacement) # replace the match
if plot:
plt.imshow(image_data)
plt.show()
return data, image_data
data, image_data = plot_image('cabbage.jpg') # load the image, and apply the mask
# Find the locations of all the pixels that are non-white (i.e. 255)
# This returns 3 arrays of the same size)
indices = np.where(image_data != 255)
# Now, calculate the area: in this case, ~ 62.04 %
effective_area = indices[0].size / float(data.size)
The selected region in question is shown here below:
Note that image_data contains the pixel information that has been masked, and would provide the coordinates (albeit in pixel space) of where each occupied (i.e. non-white) pixel occurs. The issue with this of course is that these are pixel coordinates and not a physical one. But, since you know the physical dimensions, extrapolating those quantities are easily done.
Furthermore, with the effective area known, and knowledge of the physical dimension, you have a good estimate of the real area occupied. To obtain better results, tweak the value of the color threshold (i.e. color). In your real-life example, since you know the color, search within a pixel range around that value (to offset noise and lighting issues).
The above method is a bit crude - but effective - and, it may be worth exploring using it in tandem with edge-detection, as that could help improve the region identification, and area selection. (Note that isn't always strictly true!) Also, color deconvolution may be useful: http://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_ihc_color_separation.html#sphx-glr-auto-examples-color-exposure-plot-ihc-color-separation-py
The downside to this is that the analysis requires a high quality image, good lighting; and, most importantly, it's likely that you'll lose some of the more finer details of the edges, which would impact your masses.
2) Instead of resorting to camera work, and given that you have the AutoCAD model, you can use that and the software itself in addition to the above prescribed method.
Since you've colored each material in the model differently, you can use AutoCAD's slicing tool, and can do something similar to what the first method suggests doing physically: slicing the model, and taking pictures of the slice to expose the surface. Then, using a similar method described above of color masking / edge detection / region determination through color selection, you should obtain a much better and (arguably) very accurate result.
The downside to this, is that you're also limited by the image quality used. But, as it's software, that shouldn't be much of an issue, and you can get extremely high accuracy - close to its actual result.
The last suggestion to improve these results would be to script numerous random thin slicing of the AutoCAD model along a particular directional vector shared by every subsequent slice, exporting each exposed surface, analyzing each image in the manner described above, and then collecting those results to given you a Monte Carlo-like and statistically quantifiable determination of the mass (to correct for geometry effects due to slicing along one given axis).

Categories