Determining the pattern orientation of a spatiotemporal image

Determining the pattern orientation of a spatiotemporal image - python

How can I obtain average direction of the pattern shown in the figure below. It is the direction of the red arrow relative to the yellow (horizontal) line. Any ideas for an approach? I couldn't figure out a way to approach. This is a spatio-temporal image created from a video. Thank you.
Here is my original image:

The simplest approach would be to compute the gradient vector (x derivative and y derivative) and find its direction at each pixel (atan2(y,x)). The average orientation is what you want, not the average direction (will cancel out). So apply modulus pi, then average across the image.
The best way to compute image gradients is through the Gaussian gradients.
The structure tensor is the more robust way of accomplishing this. In short, it computes local averages of the gradient vector to reduce the effect of noise. It does this by computing the outer product of the gradient vector with itself, which produces a symmetric matrix. The individual components of this matrix can then be locally averaged (i.e. apply a smoothing filter). This is similar to computing the angle of the vectors, doubling the angles to make vectors in opposite directions equal, then averaging them.
Note that you can apply either of these solutions in 3D (you can think of the video data as 2D + time = 3D). That way, you compute both the speed and the direction of motion within the 2D frame, rather than just the speed along the direction in which you extracted the 2D image you show in the question. The image gradient and the concept of the structure tensor easily extend to 3D. This 3D approach is similar to the approach by Lucas-Kanade for optical flow. If you follow that link to Wikipedia, you'll see it uses the structure tensor in 2D, and adds gradients along the time dimension.

Might be useful to try Fourier transform.
In your case you should get two vertical lines in the middle of the transformed image corresponding to the information when traveling vertically in the image.
On the other hand there shouldn't be a horizontal line since when traveling horizontally in the image there is little information (little change)
For example you can use this online site to play with fourier transforms:
https://www.ejectamenta.com/Fourifier-fullscreen/
It might sound like the problem remains the same but in fact it is much easier now.
The 2D pattern is converted into dominant lines which are quite easy to find in the transformed image.
For example you can search for the strongest pixels in the image and simply determine if they are more likely to be horizontal line or a vertical line or determine the angle of the dominant line. Then rotate by 90 degrees.
For example see this image of wood grain and the resulting transformed image:
And don't worry about the two lines. The image is symmetric so just ignore 3/4 of the image and look in 1 quarter.

I recommend giving the Hough transform a go, it is available in OpenCv. The Hough transform maps lines to angles, and might be useful in your case.

Related

Efficient way to draw line with specific falloff (eg blurry line)

Not completely sure what to call this problem but I will try my best to explain it here.
I have the coordinates of a line I want to draw onto a numpy array. However, I don't just want a simple line, but a thick line where I can specify the falloff (brightness with distance from the line) with a curve or mathematic function. For example, I might want to have a gaussian falloff, which would look something similar to the example below where a gaussian blur was applied to the image.
However, using blur filters does not allow the flexibility in functions I would like and does not enable precise control of the falloff (for example, when I want points on the line to have exactly value 1.0 and points further than say 10 pixels away to be 0.0).
I have attempted to solve this problem by creating the falloff pattern for a point, and then drawing that pattern into a new numpy channel for every point of the line, before merging them via the max function. This works but is too slow.
Is there a more efficient way to draw such a line from my input coordinates?

The solution I came up with is to make use of dilations. This method is more general and can be applied to any polygonal shape or binary mask.
Rasterize geometry the simple way first. For points set the corresponding pixel; for lines draw 1 pixel thick lines with library function from opencv or similar; for polygons draw the boundary or fill the polygon with opencv functions. This results in the initial mask with value 1 on the lines.
Iteratively apply dilations to this mask. This grows the mask pixel by pixel. Set the strength of the new mask according to an arbitrary falloff function.
The dilation operation is available in opencv. Alternatively, it can efficiently be implemented as a simple convolution with boolean matrices, which can then run on GPU devices.
An example of the results can be seen with the polygonal input:
Exponential falloff:
Sinusoidal falloff:

Conversion from pixel to general Metric(mm, in)

I am using openCV to process an image and use houghcircles to detect the circles in the image under test, and also calculating the distance between their centers using euclidean distance.
Since this would be in pixels, I need the absolute distances in mm or inches, can anyone let me know how this can be done
Thanks in advance.

The image formation process implies taking a 2D projection of the real, 3D world, through a lens. In this process, a lot of information is lost (e.g. the third dimension), and the transformation is dependent on lens properties (e.g. focal distance).
The transformation between the distance in pixels and the physical distance depends on the depth (distance between the camera and the object) and the lens. The complex, but more general way, is to estimate the depth (there are specialized algorithms which can do this under certain conditions, but require multiple cameras/perspectives) or use a depth camera which can measure the depth. Once the depth is known, after taking into account the effects of the lens projection, an estimation can be made.
You do not give much information about your setup, but the transformation can be measured experimentally. You simply take a picture of an object of known dimensions and you determine the physical dimension of one pixel (e.g. if the object is 10x10 cm and in the picture it has 100x100px, then 10px is 1mm). This is strongly dependent on the distance to the camera from the object.
An approach a bit more automated is to use a certain pattern (e.g. checkerboard) of known dimensions. It can be automatically detected in the image and the same transformation can be performed.

opencv: reprojectImageTo3d what is the metric unit of the (X,Y,Z) point?

firstly, I wanted to know the metric unit of the 3d point we got from the opencv reprojectImageTo3D() function.
secondly, I have calibrated each camera individually with a chessboard with "mm" as metric unit and then use the opencv functions to calibrate the stereo system, rectify the stereo pair and then compute the disparity map.
Basically i want the distance of a center of a bounding box.
so i compute the disparity map and reproject it to 3D with the reprojectImageTo3D() function and then i take from those 3D points, the one which correspond to the center of the bbox (x, y).
But which image should i use to get the center of bbox? the rectified or the original?
Secondly, is it better to use the same camera model for a stereo system?
Thank you

During the calibration process (calibrateCamera) you have to give the points grid of your calibration target. The unit that you give there will then define the unit for the rest of the process.
When calling reprojectImageTo3D, you probably used the matrix Q output by stereoRectify, which takes in the individual calibrations (cameraMatrix1, cameraMatrix2). That's where the unit came from.
So in your case you get mm I guess.
reprojectImageTo3D has to use the rectified image, since the disparity is calculated using the rectified image (It wouldn't be properly aligned otherwise). Also, when calculating the disparity, it is calculated relative to the first image given (left one in the doc). So you should use the left rectified image if you computed the disparity like this: cv::StereoMatcher::compute(left, right)
I never had two different cameras, but it makes sense to use the same ones. I think that if you have very different color images, edges or any image difference, that could potentially influence the disparity quality.
What is actually very important (unless you are only working with still pictures), is to use cameras that can be synchronized by hardware (e.g. GENLOCK signal: https://en.wikipedia.org/wiki/Genlock). If you have a bit of delay between left and right and a moving subject, the disparity can be wrong. This is also true for the calibration.
Hope this helps!

Problems understand preprocessing steps that turn images into the format of MNIST dataset

I want to make a program that turns a given image into the format of the MNIST dataset, as a kind of exercise to understand the various preprocessing steps involved. But the description the authors made on their site: http://yann.lecun.com/exdb/mnist/ was not entirely straightforward:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
So from the original I have to normalize it to fit a 20x20 box, and still preserving their aspect ratio (I think they mean the aspect ratio of the actual digit, not the entire image). Still I really don't know how to do this.
Center of mass: I have found some online code about this, but I don't think I understand the principle. Here is my take on this: The coordinate of each pixel is actually a vector from the origin to that point, so for each point you multiply the coordinate with the image intensity, then sum everything, before dividing by the total intensity of the image. I may be wrong about this :(
Translating the image so as to position this point at the center: Maybe cook up some translation equation, or maybe use a convolutional filter to facilitate translation, then find a path that leads to the center (Dijikstra's shortest path ?).
All in all, I think i still need guidance on this. Can anyone explain about these parts for me ? Thank you very much.
I think

Image registration using python and cross-correlation

I got two images showing exaktly the same content: 2D-gaussian-shaped spots. I call these two 16-bit png-files "left.png" and "right.png". But as they are obtained thru an slightly different optical setup, the corresponding spots (physically the same) appear at slightly different positions. Meaning the right is slightly stretched, distorted, or so, in a non-linear way. Therefore I would like to get the transformation from left to right.
So for every pixel on the left side with its x- and y-coordinate I want a function giving me the components of the displacement-vector that points to the corresponding pixel on the right side.
In a former approach I tried to get the positions of the corresponding spots to obtain the relative distances deltaX and deltaY. These distances then I fitted to the taylor-expansion up to second order of T(x,y) giving me the x- and y-component of the displacement vector for every pixel (x,y) on the left, pointing to corresponding pixel (x',y') on the right.
To get a more general result I would like to use normalized cross-correlation. For this I multiply every pixelvalue from left with a corresponding pixelvalue from right and sum over these products. The transformation I am looking for should connect the pixels that will maximize the sum. So when the sum is maximzied, I know that I multiplied the corresponding pixels.
I really tried a lot with this, but didn't manage. My question is if somebody of you has an idea or has ever done something similar.
import numpy as np
import Image
left = np.array(Image.open('left.png'))
right = np.array(Image.open('right.png'))
# for normalization (http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation)
left = (left - left.mean()) / left.std()
right = (right - right.mean()) / right.std()
Please let me know if I can make this question more clear. I still have to check out how to post questions using latex.
Thank you very much for input.
[left.png] http://i.stack.imgur.com/oSTER.png
[right.png] http://i.stack.imgur.com/Njahj.png
I'm afraid, in most cases 16-bit images appear just black (at least on systems I use) :( but of course there is data in there.
UPDATE 1
I try to clearify my question. I am looking for a vector-field with displacement-vectors that point from every pixel in left.png to the corresponding pixel in right.png. My problem is, that I am not sure about the constraints I have.
where vector r (components x and y) points to a pixel in left.png and vector r-prime (components x-prime and y-prime) points to the corresponding pixel in right.png. for every r there is a displacement-vector.
What I did earlier was, that I found manually components of vector-field d and fitted them to a polynom second degree:
So I fitted:
and
Does this make sense to you? Is it possible to get all the delta-x(x,y) and delta-y(x,y) with cross-correlation? The cross-correlation should be maximized if the corresponding pixels are linked together thru the displacement-vectors, right?
UPDATE 2
So the algorithm I was thinking of is as follows:
Deform right.png
Get the value of cross-correlation
Deform right.png further
Get the value of cross-correlation and compare to value before
If it's greater, good deformation, if not, redo deformation and do something else
After maximzied the cross-correlation value, know what deformation there is :)
About deformation: could one do first a shift along x- and y-direction to maximize cross-correlation, then in a second step stretch or compress x- and y-dependant and in a third step deform quadratic x- and y-dependent and repeat this procedure iterativ?? I really have a problem to do this with integer-coordinates. Do you think I would have to interpolate the picture to obtain a continuous distribution?? I have to think about this again :( Thanks to everybody for taking part :)

OpenCV (and with it the python Opencv binding) has a StarDetector class which implements this algorithm.
As an alternative you might have a look at the OpenCV SIFT class, which stands for Scale Invariant Feature Transform.
Update
Regarding your comment, I understand that the "right" transformation will maximize the cross-correlation between the images, but I don't understand how you choose the set of transformations over which to maximize. Maybe if you know the coordinates of three matching points (either by some heuristics or by choosing them by hand), and if you expect affinity, you could use something like cv2.getAffineTransform to have a good initial transformation for your maximization process. From there you could use small additional transformations to have a set over which to maximize. But this approach seems to me like re-inventing something which SIFT could take care of.
To actually transform your test image you can use cv2.warpAffine, which also can take care of border values (e.g. pad with 0). To calculate the cross-correlation you could use scipy.signal.correlate2d.
Update
Your latest update did indeed clarify some points for me. But I think that a vector field of displacements is not the most natural thing to look for, and this is also where the misunderstanding came from. I was thinking more along the lines of a global transformation T, which applied to any point (x,y) of the left image gives (x',y')=T(x,y) on the right side, but T has the same analytical form for every pixel. For example, this could be a combination of a displacement, rotation, scaling, maybe some perspective transformation. I cannot say whether it is realistic or not to hope to find such a transformation, this depends on your setup, but if the scene is physically the same on both sides I would say it is reasonable to expect some affine transformation. This is why I suggested cv2.getAffineTransform. It is of course trivial to calculate your displacement Vector field from such a T, as this is just T(x,y)-(x,y).
The big advantage would be that you have only very few degrees of freedom for your transformation, instead of, I would argue, 2N degrees of freedom in the displacement vector field, where N is the number of bright spots.
If it is indeed an affine transformation, I would suggest some algorithm like this:
identify three bright and well isolated spots on the left
for each of these three spots, define a bounding box so that you can hope to identify the corresponding spot within it in the right image
find the coordinates of the corresponding spots, e.g. with some correlation method as implemented in cv2.matchTemplate or by also just finding the brightest spot within the bounding box.
once you have three matching pairs of coordinates, calculate the affine transformation which transforms one set into the other with cv2.getAffineTransform.
apply this affine transformation to the left image, as a check if you found the right one you could calculate if the overall normalized cross-correlation is above some threshold or drops significantly if you displace one image with respect to the other.
if you wish and still need it, calculate the displacement vector field trivially from your transformation T.
Update
It seems cv2.getAffineTransform expects an awkward input data type 'float32'. Let's assume the source coordinates are (sxi,syi) and destination (dxi,dyi) with i=0,1,2, then what you need is
src = np.array( ((sx0,sy0),(sx1,sy1),(sx2,sy2)), dtype='float32' )
dst = np.array( ((dx0,dy0),(dx1,dy1),(dx2,dy2)), dtype='float32' )
result = cv2.getAffineTransform(src,dst)

I don't think a cross correlation is going to help here, as it only gives you a single best shift for the whole image. There are three alternatives I would consider:
Do a cross correlation on sub-clusters of dots. Take, for example, the three dots in the top right and find the optimal x-y shift through cross-correlation. This gives you the rough transform for the top left. Repeat for as many clusters as you can to obtain a reasonable map of your transformations. Fit this with your Taylor expansion and you might get reasonably close. However, to have your cross-correlation work in any way, the difference in displacement between spots must be less than the extend of the spot, else you can never get all spots in a cluster to overlap simultaneously with a single displacement. Under these conditions, option 2 might be more suitable.
If the displacements are relatively small (which I think is a condition for option 1), then we might assume that for a given spot in the left image, the closest spot in the right image is the corresponding spot. Thus, for every spot in the left image, we find the nearest spot in the right image and use that as the displacement in that location. From the 40-something well distributed displacement vectors we can obtain a reasonable approximation of the actual displacement by fitting your Taylor expansion.
This is probably the slowest method, but might be the most robust if you have large displacements (and option 2 thus doesn't work): use something like an evolutionary algorithm to find the displacement. Apply a random transformation, compute the remaining error (you might need to define this as sum of the smallest distance between spots in your original and transformed image), and improve your transformation with those results. If your displacements are rather large you might need a very broad search as you'll probably get lots of local minima in your landscape.
I would try option 2 as it seems your displacements might be small enough to easily associate a spot in the left image with a spot in the right image.
Update
I assume your optics induce non linear distortions and having two separate beampaths (different filters in each?) will make the relationship between the two images even more non-linear. The affine transformation PiQuer suggests might give a reasonable approach but can probably never completely cover the actual distortions.
I think your approach of fitting to a low order Taylor polynomial is fine. This works for all my applications with similar conditions. Highest orders probably should be something like xy^2 and x^2y; anything higher than that you won't notice.
Alternatively, you might be able to calibrate the distortions for each image first, and then do your experiments. This way you are not dependent on the distribution of you dots, but can use a high resolution reference image to get the best description of your transformation.
Option 2 above still stands as my suggestion for getting the two images to overlap. This can be fully automated and I'm not sure what you mean when you want a more general result.
Update 2
You comment that you have trouble matching dots in the two images. If this is the case, I think your iterative cross-correlation approach may not be very robust either. You have very small dots, so overlap between them will only occur if the difference between the two images is small.
In principle there is nothing wrong with your proposed solution, but whether it works or not strongly depends on the size of your deformations and the robustness of your optimization algorithm. If you start off with very little overlap, then it may be hard to find a good starting point for your optimization. Yet if you have sufficient overlap to begin with, then you should have been able to find the deformation per dot first, but in a comment you indicate that this doesn't work.
Perhaps you can go for a mixed solution: find the cross correlation of clusters of dots to get a starting point for your optimization, and then tweak the deformation using something like the procedure you describe in your update. Thus:
For a NxN pixel segment find the shift between the left and right images
Repeat for, say, 16 of those segments
Compute an approximation of the deformation using those 16 points
Use this as the starting point of your optimization approach

You might want to have a look at bunwarpj which already does what you're trying to do. It's not python but I use it in exactly this context. You can export a plain text spline transformation and use it if you wish to do so.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.