Determine the distance between a camera and person/face in python-OpenCV - python

There are many tutorials on how to calculate the distance between a camera and an object. Is it possible to calculate the approximate distance between a person detected and the camera using OpenCV?.

Yes it is possible. Like mentioned by #hkchengrex consider your face an object. There's plenty of methods. I'd recommend SIFT Feature Matching of the methods described following that link.
Here are roughly the required steps:
Take a picture of the person and measure the distance manually.
Crop this picture to only contain the person.
Extract the image features (e.g. as sift descriptor)
Take a second picture with the same person but unknown distance.
Detect the person via sift matching (see link above)
Compute a transformation between those two sift feature vectors
Apply the transformation to the distance measured in 1.
Best start at the link provided and further SIFT tutorials in opencv. The required approach is a very simple one and will only work if the person in the picture that is being examined is very similar to the person of picture one. For more advanced approaches I'd refer to scientific papers. Search for "person detection".
In reply to the comments
TL;DR person with same height/width in reality but displayed smaller/larger in the image can be measured regarding distance.
The depicted approach works under the hood as follows. The person (=cropped image) captured at step 2 can be found in any future image as long as he/she appears very similar. In the new image it will give you the rectangular region where the person is located. As the dimensions of this rectangle are now smaller/larger you can take those changes to compute the transformation (which is basically intercept theorem) and thereby the new distance.
What does this mean for a general approach measuring ANY person?
In case the person has the same width/height as the person from step 2 this process works flawlessly. In case they are of similar but but not identical height/width there will be calculation errors. But the results MAY still suffice for your use case. (You can define a generic human e.g. 1,8m of height and XX of width). Nevertheless SIFT might be a bit too specific here. Sorry I'd just refer you to google to see what works best.
If your camera is fixated and the recorded scene doesn't change too much I'd just define a ground plane and manually annotate every pixel projected on this plane with a depth value. So you only have to detect the arbitrary person, see where their feet touch the ground plane and look up this pixel's defined depth value.
If the use case has higher demands you'd have to measure depth in a more complex fashion. This can be done using a stereo-camera rig, a depth sensor or an image sequence via structure from motion.
So there is not the "one can do all" method in OpenCV. It always depends on the use case, the environment and a combination of quite elaborate methods.

Related

calculate particle size distribution from AFM measurements

I am trying to obtain a radius and diameter distribution from some AFM (Atomic force microscopy) measurements. So far I am trying out Gwyddion, ImageJ and different workflows in Matlab.
At the moment the best results I have found is to use Gwyddion and to take the Phase image, high pass filter it and then try an edge detection with 'Laplacian of Gaussian'. The result is shown in figure 3. However this image is still too noisy and doesnt really capture the edges of all the particles. (some are merged together others do not have a clear perimeter).
In the end I need an image which segments each of the spherical particles which I can use for blob detection/analysis to obtain size/radius information.
Can anyone recommend a different method?
[
I would definitely try a Granulometry, it was designed for something really similar. There is a good explanation of granulometry here starting page 158.
The granulometry will perform consecutive / increasing openings that will erase the different patterns according to their dimensions. The bigger the pattern, the latter it will be erased. It will give you a curve that represent the pattern dimension distributions in your image, so exactly what you want.
However, it will not give you any information about the position inside the image. If you want to have a rough modeling of the blobs present in your image, you can take a look to the Ultimate Opening.
Maybe you can use Avizo, it's a powerful software for dealing with image issues, especially for three D data (CT)

Converting an AutoCAD model to a matrix of points/volumes with the mass density specified at each location

I am an experimental physicist (grad student) that is trying to take an AutoCAD model of the experiment I've built and find the gravitational potential from the whole instrument over a specified volume. Before I find the potential, I'm trying to make a map of the mass density at each point in the model.
What's important is that I already have a model and in the end I'll have a something that says "At (x,y,z) the value is d". If that's an crazy csv file, a numpy array, an excel sheet, or... whatever, I'll be happy.
Here's what I've come up with so far:
Step 1: I color code the AutoCAD file so that color associates with material.
Step 2: I send the new drawing/model to a slicer (made for 3D printing). This takes my 3D object and turns it into equally spaced (in z-direction) 2d objects... but then that's all output as g-code. But hey! G-code is a way of telling a motor how to move.
Step 3: This is the 'hard part' and the meat of this question. I'm thinking that I take that g-code, which is in essence just a set of instructions on how to move a nozzle and use it to populate a numpy array. Basically I have 3D array, each level corresponds to one position in z, and the grid left is my x-y plane. It reads what color is being put where, and follows the nozzle and puts that mass into those spots. It knows the mass because of the color. It follows the path by parsing the g-code.
When it is done with that level, it moves to the next grid and repeats.
Does this sound insane? Better yet, does it sound plausible? Or maybe someone has a smarter way of thinking about this.
Even if you just read all that, thank you. Seriously.
Does this sound insane? Better yet, does it sound plausible?
It's very reasonable and plausible. Using the g-code could do that, but it would require a g-code interpreter that could map the instructions to a 2D path. (Not 3D, since you mentioned that you're taking fixed z-slices.) That could be problematic, but, if you found one, it could work, but may require some parser manipulation. There are several of these in a variety of languages, that could be useful.
SUGGESTION
From what you describe, it's akin to doing a MRI scan of the object, and trying to determine its constituent mass profile along a given axis. In this case, and unlike MRI, you have multiple colors, so that can be used to your advantage in region selection / identification.
Even if you used a g-code interpreter, it would reproduce an image whose area you'll still have to calculate, so noting that and given that you seek to determine and classify material composition by path (in that the path defines the boundary of a particular material, which has a unique color), there may be a couple ways to approach this without resorting to g-code:
1) If the colors of your material are easily (or reasonably) distinguishable, you can create a color mask which will quantify the occupied area, from which you can then determine the mass.
That is, if you take a photograph of the slice, load the image into a numpy array, and then search for a specific value (say red), you can identify the area of the region. Then, you apply a mask on your array. Once done, you count the occupied elements within your array, and then you divide it by the array size (i.e. rows by columns), which would give you the relative area occupied. Since you know the mass of the material, and there is a constant z-thickness, this will give you the relative mass. An example of color masking using numpy alone is shown here: http://scikit-image.org/docs/dev/user_guide/numpy_images.html
As such, let's define an example that's analogous to your problem - let's say we have a picture of a red cabbage, and we want to know which how much of the picture contains red / purple-like pixels.
To simplify our life, we'll set any pixel above a certain threshold to white (RGB: 255,255,255), and then count how many non-white pixels there are:
from copy import deepcopy
import numpy as np
import matplotlib.pyplot as plt
def plot_image(fname, color=128, replacement=(255, 255, 255), plot=False):
# 128 is a reasonable guess since most of the pixels in the image that have the
# purplish hue, have RGB's above this value.
data = imread(fname)
image_data = deepcopy(data) # copy the original data (for later use if need be)
mask = image_data[:, :, 0] < color # apply the color mask over the image data
image_data[mask] = np.array(replacement) # replace the match
if plot:
plt.imshow(image_data)
plt.show()
return data, image_data
data, image_data = plot_image('cabbage.jpg') # load the image, and apply the mask
# Find the locations of all the pixels that are non-white (i.e. 255)
# This returns 3 arrays of the same size)
indices = np.where(image_data != 255)
# Now, calculate the area: in this case, ~ 62.04 %
effective_area = indices[0].size / float(data.size)
The selected region in question is shown here below:
Note that image_data contains the pixel information that has been masked, and would provide the coordinates (albeit in pixel space) of where each occupied (i.e. non-white) pixel occurs. The issue with this of course is that these are pixel coordinates and not a physical one. But, since you know the physical dimensions, extrapolating those quantities are easily done.
Furthermore, with the effective area known, and knowledge of the physical dimension, you have a good estimate of the real area occupied. To obtain better results, tweak the value of the color threshold (i.e. color). In your real-life example, since you know the color, search within a pixel range around that value (to offset noise and lighting issues).
The above method is a bit crude - but effective - and, it may be worth exploring using it in tandem with edge-detection, as that could help improve the region identification, and area selection. (Note that isn't always strictly true!) Also, color deconvolution may be useful: http://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_ihc_color_separation.html#sphx-glr-auto-examples-color-exposure-plot-ihc-color-separation-py
The downside to this is that the analysis requires a high quality image, good lighting; and, most importantly, it's likely that you'll lose some of the more finer details of the edges, which would impact your masses.
2) Instead of resorting to camera work, and given that you have the AutoCAD model, you can use that and the software itself in addition to the above prescribed method.
Since you've colored each material in the model differently, you can use AutoCAD's slicing tool, and can do something similar to what the first method suggests doing physically: slicing the model, and taking pictures of the slice to expose the surface. Then, using a similar method described above of color masking / edge detection / region determination through color selection, you should obtain a much better and (arguably) very accurate result.
The downside to this, is that you're also limited by the image quality used. But, as it's software, that shouldn't be much of an issue, and you can get extremely high accuracy - close to its actual result.
The last suggestion to improve these results would be to script numerous random thin slicing of the AutoCAD model along a particular directional vector shared by every subsequent slice, exporting each exposed surface, analyzing each image in the manner described above, and then collecting those results to given you a Monte Carlo-like and statistically quantifiable determination of the mass (to correct for geometry effects due to slicing along one given axis).

Detect grid nodes using OpenCV (or using something else)

I have a grid on pictures (they are from camera). After binarization they look like this (red is 255, blue is 0):
What is the best way to detect grid nodes (crosses) on these pictures?
Note: grid is distorted from cell to cell non-uniformly.
Update:
Some examples of different grids and thier distortions before binarization:
In cases like this I first try to find the best starting point.
So, first I thresholded your image (however I could also skeletonize it and just then threshold. But this way some data is lost irrecoverably):
Then, I tried loads of tools to get the most prominent features emphasized in bulk. Finally, playing with Gimp's G'MIC plugin I found this:
Based on the above I prepared a universal pattern that looks like this:
Then I just got a part of this image:
To help determine angle I made local Fourier freq graph - this way you can obtain your pattern local angle:
Then you can make a simple thick that works fast on modern GPUs - get difference like this (missed case):
When there is hit the difference is minimal; what I had in mind talking about local maximums refers more or less to how the resulting difference should be treated. It wouldn't be wise to weight outside of the pattern circle difference the same as inside due to scale factor sensitivity. Thus, inside with cross should be weighted more in used algorithm. Nevertheless differenced pattern with image looks like this:
As you can see it's possible to differentiate between hit and miss. What is crucial is to set proper tolerance and use Fourier frequencies to obtain angle (with thresholded images Fourier usually follows overall orientation of image analyzed).
The above way can be later complemented by Harris detection, or Harris detection can be modified using above patterns to distinguish two to four closely placed corners.
Unfortunately, all techniques are scale dependent in such case and should be adjusted to it properly.
There are also other approaches to your problem, for instance by watershedding it first, then getting regions, then disregarding foreground, then simplifying curves, then checking if their corners form a consecutive equidistant pattern. But to my nose it would not produce correct results.
One more thing - libgmic is G'MIC library from where you can directly or through bindings use transformations shown above. Or get algorithms and rewrite them in your app.
I suppose that this can be a potential answer (actually mentioned in comments): http://opencv.itseez.com/2.4/modules/imgproc/doc/feature_detection.html?highlight=hough#houghlinesp
There can also be other ways using skimage tools for feature detection.
But actually I think that instead of Hough transformation that could contribute to huge bloat and and lack of precision (straight lines), I would suggest trying Harris corner detection - http://docs.opencv.org/2.4/doc/tutorials/features2d/trackingmotion/harris_detector/harris_detector.html .
This can be further adjusted (cross corners, so local maximum should depend on crossy' distribution) to your specific issue. Then some curves approximation can be done based on points got.
Maybe you cloud calculate Hough Lines and determine the intersections. An OpenCV documentation can be found here

How to track and count multiple cars in a video using contours?

Steps I have followed:
Background subtraction with preprocessing.
Contour detection.
With these two steps, I am able to draw contours on all moving cars in the video. But how do I track contours to count number of cars in the video ?
I searched around a bit and there seem to be different techniques like Kalman Filter, Lucas Kannade and Optical Flow... But I don't know which one to use for my usecase. I am using opencv3-python.
Actually, this seems like a general question, but I am going to give a point of view (Myself, I had the same problem but with pointclouds, although it may be different than what you asked, I hope it will give you an idea of how to proceed).
Most of the times, once your contours are detected, tracking moving objects in the scene involves 3 main steps:
Feature Matching :
This step is about detecting features in your object (Frame N) and match it to features of objects in frame (N+1). the detection part has some standard algorithms and descriptors available in OpenCV (SURF, SIFT, ORB...) as well as the Features matching part.
Kalman Filter
The Kalman filter is used to get an initial prediction (generally by applying a constant velocity model for your objects). For each appearance-point of the track, a correspondence search is executed. If the average distance is above a specified threshold, feature matching is applied to get a better initial estimate.
In order to do that, you need to model your problem in a way it can be solved by a Kalman filter.
Dynamic Mapping
After the motion estimation, the appearance of each track is updated. In contrast to standard mapping techniques, dynamic mapping is an approach which tries to accumulate appearance details of both static and dynamic objects. thus refining your motion estimation and tracking process.
There are a lot of papers out there, you may as well take a further look at these papers :
Robust Visual Tracking and Vehicle Classification via Sparse Representation
Motion Estimation from Range Images in Dynamic Outdoor Scenes
Multiple Objects Tracking using CAMshift Algorithm in OpenCV
Hope it helps !

Detect the location of an image within a larger image

How do you detect the location of an image within a larger image? I have an unmodified copy of the image. This image is then changed to an arbitrary resolution and placed randomly within a much larger image which is of an arbitrary size. No other transformations are conducted on the resulting image. Python code would be ideal, and it would probably require libgd. If you know of a good approach to this problem you'll get a +1.
There is a quick and dirty solution, and that's simply sliding a window over the target image and computing some measure of similarity at each location, then picking the location with the highest similarity. Then you compare the similarity to a threshold, if the score is above the threshold, you conclude the image is there and that's the location; if the score is below the threshold, then the image isn't there.
As a similarity measure, you can use normalized correlation or sum of squared differences (aka L2 norm). As people mentioned, this will not deal with scale changes. So you also rescale your original image multiple times and repeat the process above with each scaled version. Depending on the size of your input image and the range of possible scales, this may be good enough, and it's easy to implement.
A proper solution is to use affine invariants. Try looking up "wide-baseline stereo matching", people looked at that problem in that context. The methods that are used are generally something like this:
Preprocessing of the original image
Run an "interest point detector". This will find a few points in the image which are easily localizable, e.g. corners. There are many detectors, a detector called "harris-affine" works well and is pretty popular (so implementations probably exist). Another option is to use the Difference-of-Gaussians (DoG) detector, it was developed for SIFT and works well too.
At each interest point, extract a small sub-image (e.g. 30x30 pixels)
For each sub-image, compute a "descriptor", some representation of the image content in that window. Again, many descriptors exist. Things to look at are how well the descriptor describes the image content (you want two descriptors to match only if they are similar) and how invariant it is (you want it to be the same even after scaling). In your case, I'd recommend using SIFT. It is not as invariant as some other descriptors, but can cope with scale well, and in your case scale is the only thing that changes.
At the end of this stage, you will have a set of descriptors.
Testing (with the new test image).
First, you run the same interest point detector as in step 1 and get a set of interest points. You compute the same descriptor for each point, as above. Now you have a set of descriptors for the target image as well.
Next, you look for matches. Ideally, to each descriptor from your original image, there will be some pretty similar descriptor in the target image. (Since the target image is larger, there will also be "leftover" descriptors, i.e. points that don't correspond to anything in the original image.) So if enough of the original descriptors match with enough similarity, then you know the target is there. Moreover, since the descriptors are location-specific, you will also know where in the target image the original image is.
You probably want cross-correlation. (Autocorrelation is correlating a signal with itself; cross correlating is correlating two different signals.)
What correlation does for you, over simply checking for exact matches, is that it will tell you where the best matches are, and how good they are. Flip side is that, for a 2-D picture, it's something like O(N^3), and it's not that simple an algorithm. But it's magic once you get it to work.
EDIT: Aargh, you specified an arbitrary resize. That's going to break any correlation-based algorithm. Sorry, you're outside my experience now and SO won't let me delete this answer.
http://en.wikipedia.org/wiki/Autocorrelation is my first instinct.
Take a look at Scale-Invariant Feature Transforms; there are many different flavors that may be more or less tailored to the type of images you happen to be working with.

Categories