Regarding extracting feature vector for image processing

Regarding extracting feature vector for image processing - python

First, pardon me if the question is a little bit hard to be understood since I am still a novice and trying my best to express my problem.
I am trying to implement the method of detecting road lane from Effective lane detection and tracking method using statistical modeling of color and lane edge-orientation thesis (Maybe not all of you guys can access this thesis).
From the thesis:
"A proposed lane segmentation method uses two distinctive features when there is an input image f (x, y) ,Z = [z1, z2 ]T for classifying lane pixels: lane HSV color feature z1 and lane edge-orientation feature z2, which can be defines as Z = [z1, z2]T = [I'(x, y), ø(x,y)]T
What I want to know is, is the Z itself only has two elements, in which each element correspond to a pixel, which also means I will have Z(x, y) feature vectors?
Or will I only have one feature vector Z in which inside the vector already contains a long list of I' and ø of each pixel?
And, how can I store this feature vector with Python (By using certain library)? I already make some search but still a little bit confused. It would be helpful if at least someone can give me a keyword so I can search deeper.

#Hilman, first thing is understanding about a feature vector, a feature is a description of your data using some properties of data(in this case pixel) for example mean or variance(or Color [r,g,b]) etc of data (pixel), or may be output of applying any transformation function(such as color space conversion) on your data (pixel) which converts your data into more appropriate form for classifying or making prediction purposes.
Here what i understand from your question description is, proposed algorithm taking HSV value(using color space conversion) of each pixel along with its gradient direction(phase), if you club them you will get 4 column vector for each pixel. so if you talk about feature vector z it will consist [H, S, V, Phase] for each pixel along with class annotation of the pixel.
in python if you want to store a feature vector you can write a csv file of a numpy array.
Thank You

Related

Convert 2D Array to a 3D Space

I am trying to develop a 3D cube with values from a flat 2D Plane. I am having a lot of difficulty trying to pseudo code it out so I was hoping to get some input from you guys.
I will try my best to express myself through pictures as I am able to visualize what I am trying to achieve.
I have a 2D output based on the black line in this figure:
I have an array with data of amplitude as each index's value i.e (0; 1) -> the 0 is the x coordinate (sample) and 1 as the y coordinate (amplitude) or as another example (~1900; ~0.25).
How do I take this 1 dimensional sequence and extrude it into a 3D picture like below:
Is there perhaps a library that does such? Or am I going about it the wrong way? The data is from a matched filter output of a sonar signal and I wish to visualize the concentration of the intensity versus where it is located in a sample on a 3D plane. The data has peaks that have inclining and declining gradient slopes before a peak.
I cannot seem to wrap my mind around such a task. Is there a library or a term used to associate what I wish to accomplish?
EDIT: I found this https://www.tutorialspoint.com/matplotlib/matplotlib_3d_surface_plot.htm
But it requires all x, y and z points. Whereas I only have x and y. Additionally I need to be able to access every coordinate (x, y, z) to be able to do range and angle estimation from sample (0, 1) (Transmitted sound where power is highest). I would only like to basically see the top of this though on another 2D axis...
EDIT 2: Following up on a comment below, I would like to convert Figure 1 above into the below image using a library if there exists.
Thanks so much in advanced!

Quantify roughness of a 2D surface based on given scatter points geometrically

How to design a simple code to automatically quantify a 2D rough surface based on given scatter points geometrically? For example, to use a number, r=0 for a smooth surface, r=1 for a very rough surface and the surface is in between smooth and rough when 0 < r < 1.
To more explicitly illustrate this question, the attached figure below is used to show several sketches of 2D rough surfaces. The dots are the scattered points with given coordinates. Accordingly, every two adjacent dots can be connected and a normal vector of each segment can be computed (marked with arrow). I would like to design a function like
def roughness(x, y):
...
return r
where x and y are sequences of coordinates of each scatter point. For example, in case (a), x=[0,1,2,3,4,5,6], y=[0,1,0,1,0,1,0]; in case (b), x=[0,1,2,3,4,5], y=[0,0,0,0,0,0]. When we call the function roughness(x, y), we will get r=1 (very rough) for case (a) and r=0 (smooth) for case (b). Maybe r=0.5 (medium) for case (d). The question is refined to what appropriate components do we need to put inside the function roughness?
Some initial thoughts:
Roughness of a surface is a local concept, which we only consider within a specific range of area, i.e. only with several local points around the location of interest. To use mean of local normal vectors? This may fail: (a) and (b) are with the same mean, (0,1), but (a) is rough surface and (b) is smooth surface. To use variance of local normal vectors? This may also fail: (c) and (d) are with the same variance, but (c) is rougher than (d).

maybe something like this:
import numpy as np
def roughness(x, y):
# angles between successive points
t = np.arctan2(np.diff(y), np.diff(x))
# differences between angles
ts = np.sin(t)
tc = np.cos(t)
dt = ts[1:] * tc[:-1] - tc[1:] * ts[:-1]
# sum of squares
return np.sum(dt**2) / len(dt)
would give you something like you're asking?

Maybe you should consider a protocol definition:
1) geometric definition of the surface first
2) grant unto that geometric surface intrinsic properties.
2.a) step function can be based on quadratic curve between two peaks or two troughs with their concatenated point as the focus of the 'roughness quadratic' using the slope to define roughness in analogy to the science behind road speed-bumps.
2.b) elliptical objects can be defined by a combination of deformation analysis with centered circles on the incongruity within the body. This can be solved in many ways analogous to step functions.
2.c) flat lines: select points that deviate from the mean and do a Newtonian around with a window of 5-20 concatenated points or what ever is clever.
3) define a proper threshold that fits what ever intuition you are defining as "roughness" or apply conventions of any professional field to your liking.
This branched approach might be quicker to program, but I am certain this solution can be refactored into a Euclidean construct of 3-point ellipticals, if someone is up for a geometry problem.

The mathematical definitions of many surface parameters can be found here, which can be easily put into numpy:
https://www.keyence.com/ss/products/microscope/roughness/surface/parameters.jsp
Image (d) shows a challenge: basically you want to flatten the shape before doing the calculation. This requires prior knowledge of the type of geometry you want to fit. I found an app Gwyddion that can do this in 3D, but it can only interface with Python 2.7, not 3.
If you know which base shape lies underneath:
fit the known shape
calculate the arc distance between each two points
remap the numbers by subtracting 1) from the original data and assigning new coordinates according to 2)
perform normal 2D/3D roughness calculations

reconstruction: why undistort image AND normalize coordinates?

From the Tutorial: https://programtalk.com/vs2/?source=python/8176/opencv-python-blueprints/chapter4/scene3D.py
I don't understand why they first undistort the images
# undistort the images
self.img1 = cv2.undistort(self.img1, self.K, self.d)
self.img2 = cv2.undistort(self.img2, self.K, self.d)
and: Compute the Essential Matrix
def _find_fundamental_matrix(self):
self.F, self.Fmask = cv2.findFundamentalMat(self.match_pts1,
self.match_pts2,
cv2.FM_RANSAC, 0.1,0.99)
def _find_essential_matrix(self):
self.E = self.K.T.dot(self.F).dot(self.K)
and also Normalize the coordinates:
first_inliers = []
second_inliers = []
for i in range(len(self.Fmask)):
if self.Fmask[i]:
# normalize and homogenize the image coordinates
first_inliers.append(self.K_inv.dot([self.match_pts1[i][0],
self.match_pts1[i][1], 1.0]))
second_inliers.append(self.K_inv.dot([self.match_pts2[i][0],
self.match_pts2[i][1], 1.0]))
Shouldn't it be either or? Or do I have some wrong understanding here?
Can please somone help me on that?

The first step, undistort, does a number of things to reverse the typical warping caused by small camera lenses. See the Wikipedia article on distortion (optics) for more background.
The last step, homogenizing the coordinates, is a completely different thing. The Wikipedia article on homogenous coordinates explains it, but the basic idea is that you add in an extra fake axis that lets you do all affine and projective transformations with chained simple matrix multiplication and then just project back to 3D at the end. Normalizing is just a step you do to make that math easier—basically, you want your extra coordinate to start off as 1.0 (multiply by the inverse of the projective norm).

The requirement for normalization is explained at page-107 of Multi-View Geometry (Hartley and Zisserman). The normalization is required in addition to the un-distortion.
If are using raw pixel values in homogeneous coordinates, the Z-coordinate which is 1 will be small compared to the X and Y co-coordinates. Eg: (X=320, Y=220, Z=1).
But if the homogenized coordinates are the image pixel positions normalized to a standard range, ie -1.0 to 1.0, then we are talking about coordinate values all of whom are kind of in the same range , Eg: (0.75, -0.89, 1.0).
If the image coordinates are of dramatically different ranges(as in the unnormalized case), then the DLT matrix produced will have a bad condition number, and consequently small variations in input image pixel positions, could produce wide variations in the result.
Please see page 107 for a very good explanation.

How would I translate this equation into code?

I am working in Python, and I a trying to compute a wight matrix for a graph of pixels, and the weight of each edge is dependent on their "feature" similarity (F(i) - F(j)), and their location similarity (X(i)-X(j)). "Features" includes intensity, color, texture.
Right now I have it implemented and it is working, but not for color images. I at first tried to simply take some RGB values and average each pixel to convert the entire image to greyscale. But that didn't work as I had hoped, and I have read throgh a paper that suggests a different method.
They say to use this: F(i) = [v, v*s*sin(h), v*s*cos(h)](i)
where h, s, and v and the HSV color values.
I am just confused on the notation. What is this suppsed to mean? What does it mean to have three different terms separated by commas inside square brackets? I'm also confused with what the (i) at the end is supposed to mean. The solution to F(i) for any given pixel should be a single number, to be able to carry out F(i)-F(j)?
I'm not asking for someone to do this for me I just need some clarification.

Features can be vectors and you can calculate distance between vectors.
f1 = numpy.array([1,2,3])
f2 = numpy.array([0,2,3])
distance = numpy.linalg.norm(f1 - f2).

Subtract mean from image

I'm implementing a CNN with Theano. In the paper, I have to do this image preprocess before train the CNN
We extracted RGB patches of 61x61 dimensions associated with each poselet activation, subtracted the mean and used this data to train the convnet model shown in Table 1
Can you tell me what does it mean with "subtracted the mean"? Tell me if these steps are correct (it is what I understood)
1) Compute the mean for Red Channel, Green Channel and Blue Channel for the whole image
2) For each pixel, subtract from red value the mean of red channel, from green value the mean of green channel and the same for the blue channel
3) Is it correct to have negative value or do I have use the abs?
Thanks all!!

You should read paper carefully, but what is the most probable is that they mean mean of the patches, so you have N matrices 61x61 pixels, which is equivalent of a vector of length 61^2 (if there are three channels then 3*61^2). What they do - they simple compute mean of each dimension, so they calculate mean over these N vectors in respect to each of the 3*61^2 dimensions. As the result they obtain a mean vector of length 3*61^2 (or mean matrix/mean patch if you prefer) and they substract it from all of these N patches. Resulting patches will have negatives values, it is perfectly fine, you should not take abs value, neural networks prefer this kind of data.

I would assume the mean mentioned in the paper is the mean over all images used in the training set (computed separately for each channel).
Several indications:
Caffe is a lib for ConvNets. In their tutorial they mention the compute image mean part: http://caffe.berkeleyvision.org/gathered/examples/imagenet.html
For this they use the following script: https://github.com/BVLC/caffe/blob/master/examples/imagenet/make_imagenet_mean.sh
which does what I indicated.
Google played around with ConvNets and published their code here: https://github.com/google/deepdream/blob/master/dream.ipynb and they do also use the mean of the training set.
This is of course only indirect evidence since I can not explain you why this happens. In fact I stumbled over this question while trying to figure out precisely that.
//EDIT:
In the mean time I found a source confirming my claim (Highlighting added by me):
There are three common forms of data preprocessing a data matrix X [...]
Mean subtraction is the most common form of preprocessing. It
involves subtracting the mean across every individual feature in the
data, and has the geometric interpretation of centering the cloud of
data around the origin along every dimension. In numpy, this operation
would be implemented as: X -= np.mean(X, axis = 0). With images
specifically, for convenience it can be common to subtract a single
value from all pixels (e.g. X -= np.mean(X)), or to do so separately
across the three color channels.
As we can see, the whole data is used to compute the mean.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.