Get HOG image features from OpenCV + Python? - python

I've read this post about how to use OpenCV's HOG-based pedestrian detector: How can I detect and track people using OpenCV?
I want to use HOG for detecting other types of objects in images (not just pedestrians). However, the Python binding of HOGDetectMultiScale doesn't seem to give access to the actual HOG features.
Is there any way to use Python + OpenCV to extract the HOG features directly from any image?

In python opencv you can compute hog like this:
import cv2
hog = cv2.HOGDescriptor()
im = cv2.imread(sample)
h = hog.compute(im)

1. Get Inbuilt Documentation: Following command on your python console will help you know the structure of class HOGDescriptor:
import cv2;
help(cv2.HOGDescriptor())
2. Example Code: Here is a snippet of code to initialize an cv2.HOGDescriptor with different parameters (The terms I used here are standard terms which are well defined in OpenCV documentation here):
import cv2
image = cv2.imread("test.jpg",0)
winSize = (64,64)
blockSize = (16,16)
blockStride = (8,8)
cellSize = (8,8)
nbins = 9
derivAperture = 1
winSigma = 4.
histogramNormType = 0
L2HysThreshold = 2.0000000000000001e-01
gammaCorrection = 0
nlevels = 64
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,
histogramNormType,L2HysThreshold,gammaCorrection,nlevels)
#compute(img[, winStride[, padding[, locations]]]) -> descriptors
winStride = (8,8)
padding = (8,8)
locations = ((10,20),)
hist = hog.compute(image,winStride,padding,locations)
3. Reasoning: The resultant hog descriptor will have dimension as:
9 orientations X (4 corner blocks that get 1 normalization + 6x4 blocks on the edges that get 2 normalizations + 6x6 blocks that get 4 normalizations) = 1764. as I have given only one location for hog.compute().
4. One more way to initialize is from xml file which contains all parameter values:
hog = cv2.HOGDescriptor("hog.xml")
To get an xml file one can do following:
hog = cv2.HOGDescriptor()
hog.save("hog.xml")
and edit the respective parameter values in xml file.

Here is a solution that uses only OpenCV:
import numpy as np
import cv2
import matplotlib.pyplot as plt
img = cv2.cvtColor(cv2.imread("/home/me/Downloads/cat.jpg"),
cv2.COLOR_BGR2GRAY)
cell_size = (8, 8) # h x w in pixels
block_size = (2, 2) # h x w in cells
nbins = 9 # number of orientation bins
# winSize is the size of the image cropped to an multiple of the cell size
hog = cv2.HOGDescriptor(_winSize=(img.shape[1] // cell_size[1] * cell_size[1],
img.shape[0] // cell_size[0] * cell_size[0]),
_blockSize=(block_size[1] * cell_size[1],
block_size[0] * cell_size[0]),
_blockStride=(cell_size[1], cell_size[0]),
_cellSize=(cell_size[1], cell_size[0]),
_nbins=nbins)
n_cells = (img.shape[0] // cell_size[0], img.shape[1] // cell_size[1])
hog_feats = hog.compute(img)\
.reshape(n_cells[1] - block_size[1] + 1,
n_cells[0] - block_size[0] + 1,
block_size[0], block_size[1], nbins) \
.transpose((1, 0, 2, 3, 4)) # index blocks by rows first
# hog_feats now contains the gradient amplitudes for each direction,
# for each cell of its group for each group. Indexing is by rows then columns.
gradients = np.zeros((n_cells[0], n_cells[1], nbins))
# count cells (border cells appear less often across overlapping groups)
cell_count = np.full((n_cells[0], n_cells[1], 1), 0, dtype=int)
for off_y in range(block_size[0]):
for off_x in range(block_size[1]):
gradients[off_y:n_cells[0] - block_size[0] + off_y + 1,
off_x:n_cells[1] - block_size[1] + off_x + 1] += \
hog_feats[:, :, off_y, off_x, :]
cell_count[off_y:n_cells[0] - block_size[0] + off_y + 1,
off_x:n_cells[1] - block_size[1] + off_x + 1] += 1
# Average gradients
gradients /= cell_count
# Preview
plt.figure()
plt.imshow(img, cmap='gray')
plt.show()
bin = 5 # angle is 360 / nbins * direction
plt.pcolor(gradients[:, :, bin])
plt.gca().invert_yaxis()
plt.gca().set_aspect('equal', adjustable='box')
plt.colorbar()
plt.show()
I have used HOG descriptor computation and visualization to understand the data layout and vectorized the loops over groups.

Despite the fact that exist a method as said in previous answers:
hog = cv2.HOGDescriptor()
I would like to post a python implementation you can find on opencv's examples directory, hoping it can be useful to understand HOG funcionallity:
def hog(img):
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0)
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1)
mag, ang = cv2.cartToPolar(gx, gy)
bin_n = 16 # Number of bins
bin = np.int32(bin_n*ang/(2*np.pi))
bin_cells = []
mag_cells = []
cellx = celly = 8
for i in range(0,img.shape[0]/celly):
for j in range(0,img.shape[1]/cellx):
bin_cells.append(bin[i*celly : i*celly+celly, j*cellx : j*cellx+cellx])
mag_cells.append(mag[i*celly : i*celly+celly, j*cellx : j*cellx+cellx])
hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells, mag_cells)]
hist = np.hstack(hists)
# transform to Hellinger kernel
eps = 1e-7
hist /= hist.sum() + eps
hist = np.sqrt(hist)
hist /= norm(hist) + eps
return hist
Regards.

I would disagree with the argument of peakxu. The HOG detector in the end is "just" a rigid linear filter. any degrees of freedom in the "object" (i.e. persons) lead to bluring in the detector, and are not actually handled by it. There is an extension of this detector using latent SVMs that does explicitly handle dgrees of freedom by introducing structural constraints between independent parts (i.e. head, arms, etc) as well as allowing for multiple appearances per object (i.e. frontal people and sideways people...).
Regarding the HOG detector in opencv: In theory you can upload another detector to be used with the features, but you cannot afaik get the features themselves. thus, if you have a trained detector (i.e. a class specific linear filter) you should be able to upload that into the detector to get the fast detections performance of opencv. that said it should be easy to hack the opencv source code to provide this access and propose this patch back to the maintainers.

I would not recommend using HOG features for detecting objects other than pedestrians. In the original HOG paper by Dalal and Triggs, they specifically mentioned that their detector is built around pedestrian detection in allowing for significant degrees of freedom in the limbs while using strong structural hints around human body.
Instead, try looking at OpenCV's HaarDetectObjects. You can learn how to train your own cascades here.

Related

Refine Lee Filter implementation in python. Filter according to the edge assigned at the pixel

Ref:
Question: Speckle ( Lee Filter) in Python
Relevant Answer to the current question - Code is borrowed from here.
from scipy.ndimage.filters import uniform_filter
from scipy.ndimage.measurements import variance
def lee_filter(img, size):
img_mean = uniform_filter(img, (size, size))
img_sqr_mean = uniform_filter(img**2, (size, size))
img_variance = img_sqr_mean - img_mean**2
overall_variance = variance(img)
img_weights = img_variance / (img_variance + overall_variance)
img_output = img_mean + img_weights * (img - img_mean)
return img_output
Question:
In the above code, instead of uniform size for the filter, I would like to specify a one of predefined windows and filter the image only with respect to the window at that pixel.
edge1 = np.array([[1,1,1],[0,1,1],[0,0,1]])
edge2 = np.array([[0,1,1],[0,1,1],[0,1,1]])
edge3 = np.array([[0,0,1],[0,1,1],[1,1,1]])
edge4 = np.array([[0,0,0],[1,1,1],[1,1,1]])
edge5 = np.array([[1,0,0],[1,1,0],[1,1,1]])
edge6 = np.array([[1,1,0],[1,1,0],[1,1,0]])
edge7 = np.array([[1,1,1],[1,1,0],[1,0,0]])
edge8 = np.array([[1,1,1],[1,1,1],[0,0,0]])
I want to convolve over the image and assign an edge to every pixel. Which will be the window for the mean filter (instead of uniform filter).
#The below program is a guess program based on algorithm and is incorrect (also incomplete). Please help me work this out on assigning window and filtering the image based on the window.
def custom_window_filter(img):
img_mean = uniform_filter(img,(5,5))
edge1 = np.array([[-1,0,1],[-1,0,1],[-1,0,1]])
edge2 = np.array([[0,1,1],[-1,0,1],[-1,-1,0]])
edge3 = np.array([[1,1,1],[0,0,0],[-1,-1,-1]])
edge4 = np.array([[1,1,0],[1,0,-1],[0,-1,-1]])
edge1_avg = sg.convolve(img_mean,edge1)
edge2_avg = sg.convolve(img_mean,edge2)
edge3_avg = sg.convolve(img_mean,edge3)
edge4_avg = sg.convolve(img_mean,edge4)
choices = np.ones(img.shape)
choices[np.where(np.abs(edge2_avg) > np.abs(edge1_avg))] = 2
choices[np.where(np.abs(edge3_avg) > np.abs(edge2_avg))] = 3
choices[np.where(np.abs(edge4_avg) > np.abs(edge3_avg))] = 4
'''
Use choices here to further refine the edge.
After acquiring the edge, use that edge to get mean and std deviation from the contents of the uniform data.
Use the said mean and std deviation to do a gaussian filter on that detected uniform data on the side of the edge.
Optional: Scale it to arbitrary window size 3x3 or 5x5 or 7x7 or 11x11
'''
P.S. I'm actually using images of size 122k x 5k (float32), can the processing be sped up using numba as it supports scipy and numpy operations.

How to extract subimages from an image?

What are the ways to count and extract all subimages given a master image?
Sample 1
Input:
Output should be 8 subgraphs.
Sample 2
Input:
Output should have 6 subgraphs.
Note: These image samples are taken from internet. Images can be of random dimensions.
Is there a way to draw lines of separation in these image and then split based on those details ?
e.g :
I don't think, there'll be a general solution to extract all single figures properly from arbitrary tables of figures (as shown in the two examples) – at least using some kind of "simple" image-processing techniques.
For "perfect" tables with constant grid layout and constant colour space between single figures (as shown in the two examples), the following approach might be an idea:
Calculate the mean standard deviation in x and y direction, and threshold using some custom parameter. The mean standard deviation within the constant colour spaces should be near zero. A custom parameter will be needed here, since there'll be artifacts, e.g. from JPG compression, which effects might be more or less severe.
Do some binary closing on the mean standard deviations using custom parameters. There might be small constant colour spaces around captions or similar, cf. the second example. Again, custom parameters will be needed here, too.
From the resulting binary "signal", we can extract the start and stop positions for each subimage, thus the subimage itself by slicing from the original image. Attention: That works only, if the tables show a constant grid layout!
That'd be some code for the described approach:
import cv2
import numpy as np
from skimage.morphology import binary_closing
def extract_from_table(image, std_thr, kernel_x, kernel_y):
# Threshold on mean standard deviation in x and y direction
std_x = np.mean(np.std(image, axis=1), axis=1) > std_thr
std_y = np.mean(np.std(image, axis=0), axis=1) > std_thr
# Binary closing to close small whitespaces, e.g. around captions
std_xx = binary_closing(std_x, np.ones(kernel_x))
std_yy = binary_closing(std_y, np.ones(kernel_y))
# Find start and stop positions of each subimage
start_y = np.where(np.diff(np.int8(std_xx)) == 1)[0]
stop_y = np.where(np.diff(np.int8(std_xx)) == -1)[0]
start_x = np.where(np.diff(np.int8(std_yy)) == 1)[0]
stop_x = np.where(np.diff(np.int8(std_yy)) == -1)[0]
# Extract subimages
return [image[y1:y2, x1:x2, :]
for y1, y2 in zip(start_y, stop_y)
for x1, x2 in zip(start_x, stop_x)]
for file in (['image1.jpg', 'image2.png']):
img = cv2.imread(file)
cv2.imshow('image', img)
subimages = extract_from_table(img, 5, 21, 11)
print('{} subimages found.'.format(len(subimages)))
for i in subimages:
cv2.imshow('subimage', i)
cv2.waitKey(0)
The print output is:
8 subimages found.
6 subimages found.
Also, each subimage is shown for visualization purposes.
For both images, the same parameters were suitable, but that's just some coincidence here!
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
NumPy: 1.20.1
OpenCV: 4.5.1
scikit-image: 0.18.1
----------------------------------------
I could only extract the sub-images using simple array slicing technique. I am not sure if this is what you are looking for. But if one knows the table columns and rows, I think you can extract the sub-images.
image = cv2.imread('table.jpg')
p = 2 #number of rows
q = 4 #number of columns
width, height, channels = image.shape
width_patch = width//p
height_patch = height//q
x=0
for i in range(0, width - width_patch, width_patch):
for j in range(0, height - height_patch, height_patch):
crop = image[i:i+width_patch, j:j+height_patch]
cv2.imwrite("image_{0}.jpg".format(x),crop)
x+=1
# cv2.imshow('crop', crop)
# cv2.waitKey(0)```

How to extract first component of FFT from 4D Image

I have 4D( 2D + slices along z axis + time frames) gray-scale image for the heart beating on different moments.
I do like to take Fourier Transform along time axis(for each slice separately), and analyze the fundamental Harmonic (also called H1 component, where H stands for Hilbert Space) so I can determine pixel regions corresponding to ROI which show strongest response to cardiac frequency.
I'm using python for this purpose, and I tried to do that with the following code, but I'm not sure that this is the correct way to do it, because I don't know how to determine the cut-frequency to keep only the fundamental Harmonic.
This link to the image which I'm dealing with
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
img = nib.load('patient057_4d.nii.gz')
f = np.fft.fft2(img)
# Move the DC component of the FFT output to the center of the spectrum
fshift = np.fft.fftshift(f)
fshift_orig = fshift.copy()
# logarithmic transformation
magnitude_spectrum = 20*np.log(np.abs(fshift))
# Create mask
rows, cols = img.shape
crow, ccol = int(rows/2), int(cols/2)
# Use mask to remove low frequency components
dist1 = 20
dist2 = 10
fshift[crow-dist1:crow+dist1, ccol-dist1:ccol+dist1] = 0
#fshift[crow-dist2:crow+dist2, ccol-dist2:ccol+dist2] = fshift_orig[crow-dist2:crow+dist2, ccol-dist2:ccol+dist2]
# logarithmic transformation
magnitude_spectrum1 = 20*np.log(np.abs(fshift))
f_ishift = np.fft.ifftshift(fshift)
# inverse Fourier transform
img_back = np.fft.ifft2(f_ishift)
# get rid of imaginary part by abs
img_back = np.abs(img_back)
plt.figure(num = 'Im_Back')
plt.imshow(abs(fshift[:,:,2,2]).astype('uint8'),cmap='gray')
plt.show()
The solution was to take Fourier transform 3D for each slice seperately, then to chose only the 2nd component of the Transform to transform it back to the spatial space, and that's it.
The benefit of this is to detect if something is moving along the third axis(time in my case).
for sl in range(img.shape[2]):
#-----Fourier--H1-----------------------------------------
# ff1[:, :, 1] H1 compnent 1, if 0 then DC
ff1 = FFT.fftn(img[:,:,sl,:])
fh = np.absolute(FFT.ifftn(ff1[:, :, 1]))
#-----Fourier--H1-----------------------------------------

Speeding up Fourier-related transform computations in python (OpenCV)

I have an image and I need to compute a fourier-related transform over it called Short Time Fourier Transform (for extra mathematical info check:http://en.wikipedia.org/wiki/Short-time_Fourier_transform).
In order to do that I need to :
(1) place a window at the starting pixel of the image (x,y)=(M/2,M/2)
(2) Truncate the image using this window
(3) Compute the FFT of the truncated image, save results.
(4) Incrementally slide the window to the right
(5) Go to step 3, until window reaches the end of the image
However I need to perform the aformentioned calculation in real time...
But it is rather slow !!!
Is there anyway to speed up the aformentioned process ??
I also include my code:
height, width = final_frame.shape
M=2
for j in range(M/2, height-M/2):
for i in range(M/2, width-M/2):
face_win=final_frame[j-M/2:j+M/2, i-M/2:i+M/2]
#these steps are perfomed in order to speed up the FFT calculation process
height_win, width_win = face_win.shape
fftheight=cv2.getOptimalDFTSize(height_win)
fftwidth=cv2.getOptimalDFTSize(width_win)
right = fftwidth - width_win
bottom = fftheight - height_win
bordertype = cv2.BORDER_CONSTANT
nimg = cv2.copyMakeBorder(face_win,0,bottom,0,right,bordertype, value = 0)
dft = cv2.dft(np.float32(face_win),flags = cv2.DFT_COMPLEX_OUTPUT)
dft_shift = np.fft.fftshift(dft)
magnitude_spectrum = 20*np.log(cv2.magnitude(dft_shift[:,:,0],dft_shift[:,:,1]))
Of course the bulk of your time is going to be spent in the FFT's and other transformation code, but I took a shot at easy optimizations of the other parts.
Changes
Frame size calculations are the same every loop so move them out (~nil improvement)
Type coercion from uint8 to float32 can be done once on the whole image rather than converting each frame. (small but measurable improvement)
If the window size is already the same as the optimal size (I guess it always will be if you keep M as a power of 2), then don't do the bordered copy. Just use the face_win view as-is. (small but measurable improvement)
Total improvement 26s --> 22s. Not much but there it is.
Standalone Code (just add 1024x768.jpg)
import time
import cv2
import numpy as np
# image loading for anybody else who wants to use this
final_frame = cv2.imread('1024x768.jpg')
final_frame = cv2.cvtColor(final_frame, cv2.COLOR_BGR2GRAY)
final_frame_f32 = final_frame.astype(np.float32) # moved out of the loop
# base data
M = 4
height, width = final_frame.shape
# various calculations moved out of the loop
m_half = M//2
height_win, width_win = [2 * m_half] * 2 # can you even use odd values for M?
fftheight = cv2.getOptimalDFTSize(height_win)
fftwidth = cv2.getOptimalDFTSize(width_win)
bordertype = cv2.BORDER_CONSTANT
right = fftwidth - width_win
bottom = fftheight - height_win
start = time.time()
for j in range(m_half, height-m_half):
for i in range(m_half, width-m_half):
face_win = final_frame_f32[j-m_half:j+m_half, i-m_half:i+m_half]
# only copy for border if necessary
if (fftheight, fftwidth) == (height_win, width_win):
nimg = face_win
else:
nimg = cv2.copyMakeBorder(face_win, 0, bottom, 0, right, bordertype, value=0)
dft = cv2.dft(nimg, flags=cv2.DFT_COMPLEX_OUTPUT)
dft_shift = np.fft.fftshift(dft)
magnitude_spectrum = 20 * np.log(cv2.magnitude(dft_shift[:, :, 0], dft_shift[:, :, 1]))
elapsed = time.time() - start
print elapsed
Bugs
I fixed these in the code above but I didn't edit your original since you may have intended it to be that way
you calculate nimg but then use the original face_win in the dft
to be explicit, I changed M/2 etc. to M//2

Correct method and Python package that can find width of an image's feature

The input is a spectrum with colorful (sorry) vertical lines on a black background. Given the approximate x coordinate of that band (as marked by X), I want to find the width of that band.
I am unfamiliar with image processing. Please direct me to the correct method of image processing and a Python image processing package that can do the same.
I am thinking PIL, OpenCV gave me an impression of being overkill for this particular application.
What if I want to make this an expert system that can classify them in the future?
I'll give a complete minimal working example (as suggested by sega_sai). I don't have access to your original image, but you'll see it doesn't really matter! The peak distributions found by the code below are:
Mean values at: 26.2840960523 80.8255092125
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("band2.png"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
# Fit function, two gaussians, adjust as needed
def fitfunc(p,x):
return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \
p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2))
errfunc = lambda p, x, y: fitfunc(p,x)-y
# Use scipy to fit, p0 is inital guess
p0 = array([0,20,1,0,75,10])
X = xrange(len(projection))
p1, success = leastsq(errfunc, p0, args=(X,projection))
Y = fitfunc(p1,X)
# Output the result
print "Mean values at: ", p1[1], p1[4]
# Plot the result
from pylab import *
subplot(211)
imshow(pic)
subplot(223)
plot(projection)
subplot(224)
plot(X,Y,'r',lw=5)
show()
Below is a simple thresholding method to find the lines and their width, it should work quite reliably for any number of lines. The yellow and black image below was processed using this script, the red/black plot illustrates the found lines using parameters of threshold = 0.3, min_line_width = 5)
The script averages the rows of an image, and then determines the basic start and end positions of each line based on a threshold (which you can set between 0 and 1), and a minimum line width (in pixels). By using thresholding and minimum line width you can easily filter your input images to get the lines out of them. The first function find_lines returns all the lines in an image as a list of tuples containing the start, end, center, and width of each line. The second function find_closest_band_width is called with the specified x_position, and returns the width of the closest line to this position (assuming you want distance to centre for each line). As the lines are saturated (255 cut-off per channel), their cross-sections are not far from a uniform distribution, so I don't believe trying to fit any kind of distribution is really going to help too much, just unnecessarily complicates.
import Image, ImageStat
def find_lines(image_file, threshold, min_line_width):
im = Image.open(image_file)
width, height = im.size
hist = []
lines = []
start = end = 0
for x in xrange(width):
column = im.crop((x, 0, x + 1, height))
stat = ImageStat.Stat(column)
## normalises by 2 * 255 as in your example the colour is yellow
## if your images start using white lines change this to 3 * 255
hist.append(sum(stat.sum) / (height * 2 * 255))
for index, value in enumerate(hist):
if value > threshold and end >= start:
start = index
if value < threshold and end < start:
if index - start < min_line_width:
start = 0
else:
end = index
center = start + (end - start) / 2.0
width = end - start
lines.append((start, end, center, width))
return lines
def find_closest_band_width(x_position, lines):
distances = [((value[2] - x_position) ** 2) for value in lines]
index = distances.index(min(distances))
return lines[index][3]
## set your threshold, and min_line_width for finding lines
lines = find_lines("8IxWA_sample.png", 0.7, 4)
## sets x_position to 59th pixel
print 'width of nearest line:', find_closest_band_width(59, lines)
I don't think that you need anything fancy for you particular task.
I would just use PIL + scipy. That should be enough.
Because you essentially need to take your image, make a 1D-projection of it
and then fit a Gaussian or something like that to it. The information about the approximate location of the band should be used a first guess for the fitter.

Categories