I'm very new to OpenCV, and i want to create simple object detector, that uses SVM. Instead of HOG, i would like to extract from my object color histograms(for example), but i couldn't find any information about it for OpenCV, everywhere is using HOG.
And my second question: is Python implemenation for SVM has less functionality than C++ (both for OpenCV) ?
You can use the OpenCV function calcHist to compute histograms.
calcHist(&bgr_planes[0], 1, 0, Mat(), b_hist, 1, &histSize, &histRange, uniform, accumulate );
where,
&bgr_planes[0]: The source array(s)
1: The number of source arrays
0: The channel (dim) to be measured. In this case it is just the
intensity so we just write 0.
Mat(): A mask to be used on the source array
b_hist: The Mat object where the histogram will be stored
1: The histogram dimensionality.
histSize: The number of bins per each used dimension
histRange: The range of values to be measured per each dimension
uniform and accumulate
Refer to the docs for more information.
You can also look at this answer which discusses C++ OpenCV SVM implementation and this answer which discusses Python OpenCV SVM implementation to get started.
Related
Outline
I'm trying to warp an image (of a spectral peak in a time series, but this is not important) by generating a polynomial based on some 'centroid' data that is associated with the image (the peak at each time step) and augmenting the polynomial. These original and augmented polynomials make up my 'source' and 'destination' points, respectively, that I am trying to warp the image with, using skimage.transform.warp().
The goal of this warping is to produce two warped images (i.e. repeat the process twice). These images would then be positively correlated with one another, or negatively correlated if one of the two warped images were to be horizontally flipped (again, not that important here).
Here is an example output for comparison:
(Note that the polynomial augmentation is performed by adding/subtracting noise at each polynomial peak/trough, proportional to the magnitude (pixel) at each point, then generating a new polynomial of the same order through these augmented points, with additional fixed points in place to prevent the augmented polynomial from inverting).
Code Snippet
I achieve this in code by creating a GeometricTransform and applying this to warp() as an inverse_map, as follows:
from skimage import transform
# Create the transformation object using the source and destination (N, 2) arrays in reverse order
# (as there is no explicit way to do an inverse polynomial transformation).
t = transform.estimate_transform('polynomial', src=destination, dst=source, order=4) # order = num_poly_degrees - 1
# Warp the original image using the transformation object
warped_image = transform.warp(image, t, order=0, mode='constant', cval=float('nan'))
Problems
I have two main problems with the resulting warp:
There are white spaces left behind due to the image warp. I know that this can be solved by changing the mode within transform.warp() from 'constant' to 'reflect', for example. However, that would repeat existing data, which is related to my next problem...
Assuming I have implemented the warp correctly, it seems to have raised the 'zig-zag' feature seen at time step 60 to ~50 (red circles). My goal with the warping is to horizontally warp the images so that each feature remains within its own time step (perhaps give-or-take a very small amount), but their 'pixel' position (x-axis) is augmented. This is also why I am unsure about using 'reflect' or another mode within transform.warp(), as this would artificially add more data, which would cause problems later in my pipeline where I compare pairs of warped images to see how they are correlated (relating back to my second paragraph in Outline).
My Attempts
I have tried using RANSAC, as used in this question which also uses a polynomial transformation: Robustly estimate Polynomial geometric transformation with scikit-image and RANSAC in order to improve the warping. I had hoped that this method would only leave behind smaller white spaces, then I would be satisfied with switching to another mode within transform.warp(), however, this does not fix either of my issues as the performance was about the same.
I have also looked into using a piecewise affine transformation and Delaunay triangulation (using cv2) as a means of both preserving the correct image dimensions (without repeating data), and having minimal y-component warping. The results do solve the two stated problems, however the warping effect is almost imperceptible, and I am not sure if I should continue down this path by adding more triangles and trying more separated source and destination points (though this line of thought may require another post).
Summary
I would like a way to warp my images horizontally using a polynomial transformation (any other suggestions for a transformation method are also welcome!), which does its best to preserve the image's features within their original time steps.
Thank you for your time.
Edit
Here is a link to a shared google drive directory contain a .py file and data necessary to run an example of this process.
I'm using ITK python wrapper (ITK not simpleITK) to prototype µCT output automated processing. I need to compute 3D object thickness map, but this feature doesn't exist as-is in ITK. The pipeline is simple:
Binarize the object
Compute the distance transform
Extract the medial axis as the distance map local-max
My problem is that the itk::RegionalMaximaImageFilter does not behave as expected (does not preserve branches). So I wanted to write a custom function that check if the central pixel is >= to its neigborhood with a 3x3x3 sliding kernel.
My idea is to take advantage of the optimized itk::RegionalMaximaImageFilter iterator (see here). However, even if this works perfectly with C++, I can't manage to find a workaround with Python (without wrapping c code with cython).
Python wrapping is not meant to access iterators, but rather invoke existing classes. What you can do is write a class in C++, and follow this to create a module which can be wrapped and used from Python.
" I wanted to write a custom function that check if the central pixel is >= to its neigborhood with a 3x3x3 sliding kernel."
This is related to gray-scale morphology's dilate which is the maximum of all the pixels in a neighborhood. In SimpleITK ( since you tagged the post ) you could simply write:
isMaximumImg = (sitk.GrayscaleDilateImageFilter(inImg, 1) == inImg)
This will result in an image where if the pixel is equal to the maximum in the neighborhood, then the output value is 1, otherwise it would be 0. This should also be able to be implemented with ITK Python by composing a similar pipeline of Filters.
I'm looking to perform optical character recognition (OCR) on a display, and want the program to work under different light conditions. To do this, I need to process and threshold the image such that there is no noise surrounding each digit, allowing me to detect the contour of the digit and perform OCR from there. I need the threshold value I use to be adaptable to these different light conditions. I've tried adaptive thresholding, but I haven't been able to get it to work.
My image processing is simple: load the image (i), grayscale i (g), apply a histogram equalization to g (h), and apply a binary threshold to h with a threshold value = t. I've worked with a couple of different datasets, and found that the optimal threshold value to make the OCR work consistently lies within the range of highest density in a histogram plot of (h) (the only part of the plot without gaps).
A histogram of (h). The values t=[190,220] are optimal for OCR. A more complete set of images describing my problem is available here: http://imgur.com/a/wRgi7
My current solution, which works but is clunky and slow, checks for:
1. There must be 3 digits
2. The first digit must be reasonably small in size
3. There must be at least one contour recognized as a digit
4. The digit must be recognized in the digit dictionary
Barring all cases being accepted, the threshold is increased by 10 (beginning at a low value) and an attempt is made again.
The fact that I can recognize the optimal threshold value on the histogram plot of (h) may just be confirmation bias, but I'd like to know if there's a way I can extract the value. This is different from how I've worked with histograms before, which has been more on finding peaks/valleys.
I'm using cv2 for image processing and matplotlib.pyplot for the histogram plots.
Check this: link it really not depend on density, it works because you did separation of 2 maximums. Local maximums are main classes foreground - left local maximum (text pixels), and background right local maximum (white paper). Optimal threshold should optimally separate these maximums. And the optimal threshold value lies in local minimum region between two local maximums.
At first, I thought "well, just make a histogram of the indexes in which data appears" which would totally work, but I don't think that will actually solve your underlying work you want to do.
I think you're misinterpreting histogram equalization. What histogram equalization does is thins out the histogram in highly concentrated areas so that if you take different bin sizes with the histogram, you'll get more or less equal quantity inside the bins. The only reason those values are dense is specifically because they appear less in the image. Histogram equalization makes other, more popular values, appear less. And the reason that range works out well is, as you see in the original grayscale histogram, values between 190 and 220 are really close to where the image begins to get bright again; i.e., where there is a clear demarkation of bright values.
You can see the way equalizeHist works directly by plotting histograms with different bin sizes. For example, here's looping over bin sizes from 3 to 20:
Edit: So just to be clear, what you want is this demarked area between the lower bump and the higher bump in your original histogram. You don't need to use equalized histograms for this. In fact, this is what Otsu thresholding (following Otsu's method) actually does: you assume the data follows a bimodal distribution, and find the point which clearly marks the point between the two distributions.
Basically, what you're asking is to find the indexes of the longest sequence of non-zero element in a 256 x 1 array.
Based on this answer, you should get what you want like this :
import cv2
import numpy as np
# load in grayscale
img = cv2.imread("image.png",0)
hist = cv2.calcHist([img],[0],None,[256],[0,256])
non_zero_sequences = np.where(np.diff(np.hstack(([False],hist!=0,[False]))))[0].reshape(-1,2)
longest_sequence_id = np.diff(non_zero_sequences,axis=1).argmax()
longest_sequence_start = non_zero_sequences[longest_sequence_id,0]
longest_sequence_stop = non_zero_sequences[longest_sequence_id,1]
Note that it is untested.
I would also recommend to use an automatic thresholding method like the Otsu's method (here a nice explanation of the method).
In Python OpenCV, you have this tutorial that explains how to do Otsu's binarization.
If you want to experiment other automatic thresholding methods, you can look at the ImageJ / Fiji software. For instance, this page summarizes all the methods implemented.
Grayscale image:
Results:
If you want to reimplement the methods, you can check the source code of the Auto_Threshold plugin. I used Fiji for this demo.
There is a type of texture features called GLGCM (Gray Level Gradient Based Co-occurrence Matrix) that captures information about how different image gradients co-occur with each other.
GLGCM is different from normal GLCM.
Can anyone help me find an implementation for GLGCM in Python?
I don't have access to the paper right now, so I am not sure about how the details are but, what if you use GLCM on gradient image normalized into 0-255 range?
Python implementation could be found in scikit-image library
OpenCV has very good documentation on generating SIFT descriptors, but this is a version of "weak SIFT", where the key points are detected by the original Lowe algorithm. The OpenCV example reads something like:
img = cv2.imread('home.jpg')
gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
sift = cv2.SIFT()
kp = sift.detect(gray,None)
kp,des = sift.compute(gray,kp)
What I'm looking for is strong/dense SIFT, which does not detect keypoints but instead calculates SIFT descriptors for a set of patches (e.g. 16x16 pixels, 8 pixels padding) covering an image as a grid. As I understand it, there are two ways to do this in OpenCV:
I could divide the image in a grid myself, and somehow convert those patches to KeyPoints
I could use a grid-based feature detector
In other words, I'd have to replace the sift.detect() line with something that gives me the keypoints I require.
My problem is that the rest of the OpenCV documentation, especially wrt Python, is severely lacking, so I have no idea how to achieve either of these things. I see in the C++ documentation that there are keypoint detectors for grid, but I don't know how to use these from Python.
The alternative is to switch to VLFeat, which has a very good DSift/PHOW implementation but means that I'll have to switch from python to matlab.
Any ideas? Thanks.
You can use Dense Sift in opencv 2.4.6 <.
Creates a feature detector by its name.
cv2.FeatureDetector_create(detectorType)
Then "Dense" string in place of detectorType
eg:-
dense=cv2.FeatureDetector_create("Dense")
kp=dense.detect(imgGray)
kp,des=sift.compute(imgGray,kp)
I'm not sure what your goal is here, but be warned, the SIFT descriptor calculation is extremely slow and was never designed to be used in a dense fashion. That being said, OpenCV makes it fairly trivial to do so.
Basically instead of using sift.detect(), you just fill in the keypoint array yourself by making a grid a keypoints however dense you want them. Then a descriptor will be calculated for each keypoint when you pass the keypoints to sift.compute().
Depending on the size of your image and the speed of your machine, this might take a very long time. If copmutational time is a factor, I suggest you look at some of the binary descriptors OpenCV has to offer.
Inspite of the OpenCV way being the standard, it was too slow for me. So for that, I used pyvlfeat, which is basically python bindings to VL-FEAT. The functions carry similar syntax as the Matlab functions