I'm having a hard time to make this work.
My image set is made of small images (58x65).
I'm using ORB with the following parameters:
# Initiate ORB detector
# default: ORB(int nfeatures=500, float scaleFactor=1.2f, int nlevels=8, int edgeThreshold=31, int firstLevel=0, int WTA_K=2, int scoreType=ORB::HARRIS_SCORE, int patchSize=31)
orb = cv2.ORB_create(
nfeatures = 500, # The maximum number of features to retain.
scaleFactor = 1.2, # Pyramid decimation ratio, greater than 1
nlevels = 8, # The number of pyramid levels.
edgeThreshold = 7, # This is size of the border where the features are not detected. It should roughly match the patchSize parameter
firstLevel = 0, # It should be 0 in the current implementation.
WTA_K = 2, # The number of points that produce each element of the oriented BRIEF descriptor.
scoreType = cv2.ORB_HARRIS_SCORE, # The default HARRIS_SCORE means that Harris algorithm is used to rank features (the score is written to KeyPoint::score and is
# used to retain best nfeatures features); FAST_SCORE is alternative value of the parameter that produces slightly less stable
# keypoints, but it is a little faster to compute.
#scoreType = cv2.ORB_FAST_SCORE,
patchSize = 7 # size of the patch used by the oriented BRIEF descriptor. Of course, on smaller pyramid layers the perceived image area covered
# by a feature will be larger.
)
As it can be seen I changed edgeThreshold and patchSize parameters, but I'm afraid these sizes are too small to find meaningful features.
I am testing with a pretty big set of parking lot images (~3900 images of 58x65), both empty and occupied.
But the results are not consistent: an image of a car parked (from out of the set) is shown as closer to empty spaces than other cars parked.
What could I have been doing wrong? My guess is the above mentioned parameters. Someone with more experience on the subject could confirm it?
Edit:
Here is a small subset of the images.
Full dataset can be found here.
ORB and small images are not typically seen together because of the window size of the detector and number of scales. Your window size is 7 x 7, the number of scales you chose is 8 with a scaling factor of 1.2. This is around the typical settings for the detector but if you do the math, you'll quickly realize that as you go down further scales the window size will be too large, prompting very few detections if any. I would not recommend you use ORB here.
Try using a dense feature descriptor, such as HOG or Dense SIFT which provides a feature descriptor for overlapping windows of pixels regardless of their composition. Judging from the images that you've described, this sounds like a better approach.
Assuming you have a grayscale image called im, for HOG:
import cv2
sample = ... # Path to image here
# Create HOG Descriptor object
hog = cv2.HOGDescriptor()
im = cv2.imread(sample, 0) # Grayscale image
# Compute HOG descriptor
h = hog.compute(im)
For Dense SIFT:
import cv2
sample = ... # Path to image here
im = cv2.imread(sample, 0) # Grayscale image
# Create SIFT object
sift = cv2.xfeatures2d.SIFT_create()
# Provide a list of keypoints in spaces of 5 pixels horizontally and vertically
# Change the step size according to what you want
step_size = 5
kp = [cv2.KeyPoint(x, y, step_size) for y in range(0, img.shape[0], step_size)
for x in range(0, img.shape[1], step_size)]
# Calculate Dense SIFT feature vector
dense_feat = sift.compute(img, kp)
Note that for the SIFT descriptor, you will need to install the opencv-contrib-python flavour of the library (i.e. pip install opencv-contrib-python).
Related
I have two images from a video, frameA and frameB. Assuming the video is panning, slowly, one can imagine that frameA and frameB have significant overlap. We can then create a panorama from the video footage.
I have tried using: opencv2.stitcher, SURF/ORB detectors with BF matching, and a few vanilla approaches. None of them are producing the results that I need [for some reason]. The main problem I am identifying is that SURF/ORB is identify too "small" a region of interest and matching incorrectly.
Example: I am in a desert with 1 single cactus in my view. I am panning across it.
The SURF/ORB is detecting regions of interest such as the EDGES of my cactus with sky/land and unable to match (not sure why) it in the next frame. The things it does detect, it does not match up well and when you use homography, it matches say the middle of the cactus with the top part of the cactus in the next frame... and it gets warped.
Is there a way to do the following?
Enforce only rotation and translation? between 2 frames -- note that there is "new" information in subsequent frames, so it can never 100% overlap.
Find best rotation and translation, with the base assumption that there is a best match? (i am very very slowly panning, and guarantee high overlap).
Ignore minor fluctuations. If my feature detectors were "large" enough, it would say, "cactus in frame 1" matches "catctus in frame 2", translate by X,Y and maybe rotate by Z.
My attempt at a solution is take the entire picture and do an "overlapping" sweep, and find the difference. Where I have a minimum, I have the proper X,Y shift. This however has two problems:
It's slow. way too slow.
it can't do rotation, without being even more slow due to search space increase.
image1 = cv2.imread('img1.png')
print(image1.shape)
img1 = cv2.cvtColor(image1,cv2.COLOR_BGR2GRAY)
nw1, nh1 = img1.shape
nw15, nh15 = int(nw1/2), int(nh1/2)
# load image 2
image2 = cv2.imread('img2.png')
img2 = cv2.cvtColor(image2,cv2.COLOR_BGR2GRAY)
nw2, nh2 = img2.shape
nw25, nh25 = int(nw2/2), int(nh2/2)
# generate base canvas, note that img1 could be top left of img2, or img2 could be top left of img1
# the search space of this is very large
nw, nh = nw1+nw2*2, nh1+nh2*2
cnw, cnh = int(nw/2), int(nh/2) # get the center point for later calculations
base_image1 = np.ones((nw,nh), np.uint8)*255 # make the background white
base_image1[cnw-nw15: cnw+nw15, cnh-nh15: cnh+nh15] = img1 # set the first image in the center
# create the image we want to "sweep over" we "pre-allocate" since creating new ones is expensive.
sweep_image = np.zeros((nw,nh), np.uint8) # keep at 0 for BLACK
import time
stime = time.time()
total_blend = []
# sweep over my search space!
for x_s in np.arange(20, 80): # limit search space so it finish this year
for y_s in np.arange(300, 500): # limit search space so it finish this year
w1, w2 = cnw-nw25+x_s, cnw+nw25+x_s # get the width slice to set our sweep image
h1, h2 = cnh-nh25+y_s, cnh+nh25+y_s # get the height slice to set our sweep image
sweep_image[w1: w2, h1: h2] = img2 # set the image
diff = cv2.absdiff(base_image1, sweep_image) # calculate the difference
total_blend.append([x_s, y_s, np.sum([diff])]) # store the transformation and coordinates
sweep_image[w1: w2, h1: h2] = 0 # reset back to zero
cv2.imshow('diff',diff)
cv2.waitKey(0)
print(time.time() - stime)
# convert to array
total_blend = np.array(total_blend)
mymin = np.min(total_blend[:,2])
print(total_blend[total_blend[:,2]==mymin]) # get the best coordinates for translation
Example below:
Example 1: Note the giant white borders, due to making sure the images the same size across the ENTIRE search space. Example 1, here is an ok ish match, but notice how the dark regions aren't very dark.
Example 2: (large white borders), but notice how the dark regions are actually black. This is close to minimum.
All help and thoughts appreciated. Is there a way to dictate the "size" of feature detectors? Is there a faster way to sweep? Maybe some RMSE and numpy eigenvalues - this is linear algebra after all...?
I am using python3, opencv2.
So far have gone with creating my own Keypoints that is similar to a Dense Feature Dector. Unlike SIFT/Corners/ORBs or any of those that find small features, a Dense Feature can be thought of as taking keypoints in a grid across the entire image.
(More here)
https://subscription.packtpub.com/book/application-development/9781785283932/10/ch10lvl1sec81/what-is-a-dense-feature-detector
i have this image:
I am interested to do segmentation only in the objects that appear in the image so i did something like this
import numpy as np
import cv2
from sklearn.cluster import MeanShift, estimate_bandwidth
#from skimage.color import rgb2lab
#Loading original image
originImg = cv2.imread('test/2019_00254.jpg')
# Shape of original image
originShape = originImg.shape
# Converting image into array of dimension [nb of pixels in originImage, 3]
# based on r g b intensities
flatImg=np.reshape(originImg, [-1, 3])
# Estimate bandwidth for meanshift algorithm
bandwidth = estimate_bandwidth(flatImg, quantile=0.1, n_samples=100)
ms = MeanShift(bandwidth = bandwidth, bin_seeding=True)
# Performing meanshift on flatImg
ms.fit(flatImg)
# (r,g,b) vectors corresponding to the different clusters after meanshift
labels=ms.labels_
# Remaining colors after meanshift
cluster_centers = ms.cluster_centers_
# Finding and diplaying the number of clusters
labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)
print("number of estimated clusters : %d" % n_clusters_)
segmentedImg = cluster_centers[np.reshape(labels, originShape[:2])]
cv2.imshow('Image',segmentedImg.astype(np.uint8))
cv2.waitKey(0)
cv2.destroyAllWindows()
but the problem is its doing segmentation in the whole image including the background so how can i do segmentation on the objects only note that i have bboxes coordinates of each object
I'd suggest you use a more straightforward input to understand (and feel) all the limitations behind the approach. The input you have is complex in terms of resolution, colors, scene complexity, object complexity, etc.
Anyway, to make this answer useful, let's do some experiments:
Detectron2, PointRend segmentation
Just in case you expect a complex model to handle the scene properly. Segmentation:
Masks:
No miracle here. The scene and objects are complex.
Monocular depth estimation
Let's try depth estimation as an obvious way to get rid of the background.
Depth (also check this example):
Result:
Part of the background is gone, but nothing to do with other objects.
Long story short, start with something simple to see the exact way your solution works.
BTW, it is always hard to work with thin and delicate details, so it is better to avoid that complexity if possible.
I run the SLIC (Simple Linear Iterative Clustering) superpixels algorithm from opencv and skimage on the same picture with, but got different results, the skimage slic result is better, Shown in the picture below.First one is opencv SLIC, the second one is skimage SLIC. I got several questions hope someonc can help.
Why opencv have the parameter 'region_size' while skimage is 'n_segments'?
Is convert to LAB and a guassian blur necessary?
Is there any trick to optimize the opecv SLIC result?
===================================
OpenCV SLIC
Skimage SLIC
# Opencv
src = cv2.imread('pic.jpg') #read image
# gaussian blur
src = cv2.GaussianBlur(src,(5,5),0)
# Convert to LAB
src_lab = cv.cvtColor(src,cv.COLOR_BGR2LAB) # convert to LAB
# SLIC
cv_slic = ximg.createSuperpixelSLIC(src_lab,algorithm = ximg.SLICO,
region_size = 32)
cv_slic.iterate()
# Skimage
src = io.imread('pic.jpg')
sk_slic = skimage.segmentation.slic(src,n_segments = 256, sigma = 5)
Image with superpixels centroid generated with the code below
# Measure properties of labeled image regions
regions = regionprops(labels)
# Scatter centroid of each superpixel
plt.scatter([x.centroid[1] for x in regions], [y.centroid[0] for y in regions],c = 'red')
but there is one superpixel less(top-left corner), and I found that
len(regions) is 64 while len(np.unique(labels)) is 65 , why?
I'm not sure why you think skimage slic is better (and I maintain skimage! 😂), but:
different parameterizations are common in mathematics and computer science. Whether you use region size or number of segments, you should get the same result. I expect the formula to convert between the two will be something like n_segments = image.size / region_size.
The original paper suggests that for natural images (meaning images of the real world like you showed, rather than e.g. images from a microscope or from astronomy), converting to Lab gives better results.
to me, based on your results, it looks like the gaussian blur used for scikit-image was higher than for openCV. So you could make the results more similar by playing with the sigma. I also think the compactness parameter is probably not identical between the two.
I have 10 greyscale brain MRI scans from BrainWeb. They are stored as a 4d numpy array, brains, with shape (10, 181, 217, 181). Each of the 10 brains is made up of 181 slices along the z-plane (going through the top of the head to the neck) where each slice is 181 pixels by 217 pixels in the x (ear to ear) and y (eyes to back of head) planes respectively.
All of the brains are type dtype('float64'). The maximum pixel intensity across all brains is ~1328 and the minimum is ~0. For example, for the first brain, I calculate this by brains[0].max() giving 1328.338086605072 and brains[0].min() giving 0.0003886114541273855. Below is a plot of a slice of a brain[0]:
I want to binarize all these brain images by rescaling the pixel intensities from [0, 1328] to {0, 1}. Is my method correct?
I do this by first normalising the pixel intensities to [0, 1]:
normalized_brains = brains/1328
And then by using the binomial distribution to binarize each pixel:
binarized_brains = np.random.binomial(1, (normalized_brains))
The plotted result looks correct:
A 0 pixel intensity represents black (background) and 1 pixel intensity represents white (brain).
I experimented by implementing another method to normalise an image from this post but it gave me just a black image. This is because np.finfo(np.float64) is 1.7976931348623157e+308, so the normalization step
normalized_brains = brains/1.7976931348623157e+308
just returned an array of zeros which in the binarizition step also led to an array of zeros.
Am I binarising my images using a correct method?
Your method of converting the image to a binary image basically amounts to random dithering, which is a poor method of creating the illusion of grey values on a binary medium. Old-fashioned print is a binary medium, they have fine-tuned the methods to represent grey-value photographs in print over centuries. This process is called halftoning, and is shaped in part by properties of ink on paper, that we do not have to deal with in binary images.
So what methods have people come up with outside of print? Ordered dithering (mostly Bayer matrix), and error diffusion dithering. Read more about dithering on Wikipedia. I wrote a blog post showing how to implement all of these methods in MATLAB some years ago.
I would recommend you use error diffusion dithering for your particular application. Here is some code in MATLAB (taken from my blog post liked above) for the Floyd-Steinberg algorithm, I hope that you can translate this to Python:
img = imread('https://i.stack.imgur.com/d5E9i.png');
img = img(:,:,1);
out = double(img);
sz = size(out);
for ii=1:sz(1)
for jj=1:sz(2)
old = out(ii,jj);
%new = 255*(old >= 128); % Original Floyd-Steinberg
new = 255*(old >= 128+(rand-0.5)*100); % Simple improvement
out(ii,jj) = new;
err = new-old;
if jj<sz(2)
% right
out(ii ,jj+1) = out(ii ,jj+1)-err*(7/16);
end
if ii<sz(1)
if jj<sz(2)
% right-down
out(ii+1,jj+1) = out(ii+1,jj+1)-err*(1/16);
end
% down
out(ii+1,jj ) = out(ii+1,jj )-err*(5/16);
if jj>1
% left-down
out(ii+1,jj-1) = out(ii+1,jj-1)-err*(3/16);
end
end
end
end
imshow(out)
Resampling the image before applying the dithering greatly improves the results:
img = imresize(img,4);
% (repeat code above)
imshow(out)
NOTE that the above process expects the input to be in the range [0,255]. It is easy to adapt to a different range, say [0,1328] or [0,1], but it is also easy to scale your images to the [0,255] range.
Have you tried a threshold on the image?
This is a common way to binarize images, rather than trying to apply a random binomial distribution. You could try something like:
binarized_brains = (brains > threshold_value).astype(int)
which returns an array of 0s and 1s according to whether the image value was less than or greater than your chosen threshold value.
You will have to experiment with the threshold value to find the best one for your images, but it does not need to be normalized first.
If this doesn't work well, you can also experiment with the thresholding options available in the skimage filters package.
IT is easy in OpenCV. as mentioned a very common way is defining a threshold, But your result looks like you are allocating random values to your intensities instead of thresholding it.
import cv2
im = cv2.imread('brain.png', cv2.CV_LOAD_IMAGE_GRAYSCALE)
(th, brain_bw) = cv2.threshold(imy, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
th = (DEFINE HERE)
im_bin = cv2.threshold(im, th, 255, cv
cv2.imwrite('binBrain.png', brain_bw)
brain
binBrain
I use OpenCV and Python and I want to remove the small connected object from my image.
I have the following binary image as input:
The image is the result of this code:
dilation = cv2.dilate(dst,kernel,iterations = 2)
erosion = cv2.erode(dilation,kernel,iterations = 3)
I want to remove the objects highlighted in red:
How can I achieve this using OpenCV?
How about with connectedComponentsWithStats (doc):
# find all of the connected components (white blobs in your image).
# im_with_separated_blobs is an image where each detected blob has a different pixel value ranging from 1 to nb_blobs - 1.
nb_blobs, im_with_separated_blobs, stats, _ = cv2.connectedComponentsWithStats(im)
# stats (and the silenced output centroids) gives some information about the blobs. See the docs for more information.
# here, we're interested only in the size of the blobs, contained in the last column of stats.
sizes = stats[:, -1]
# the following lines result in taking out the background which is also considered a component, which I find for most applications to not be the expected output.
# you may also keep the results as they are by commenting out the following lines. You'll have to update the ranges in the for loop below.
sizes = sizes[1:]
nb_blobs -= 1
# minimum size of particles we want to keep (number of pixels).
# here, it's a fixed value, but you can set it as you want, eg the mean of the sizes or whatever.
min_size = 150
# output image with only the kept components
im_result = np.zeros_like(im_with_separated_blobs)
# for every component in the image, keep it only if it's above min_size
for blob in range(nb_blobs):
if sizes[blob] >= min_size:
# see description of im_with_separated_blobs above
im_result[im_with_separated_blobs == blob + 1] = 255
Output :
In order to remove objects automatically you need to locate them in the image.
From the image you provided I see nothing that distinguishes the 7 highlighted items from others.
You have to tell your computer how to recognize objects you don't want. If they look the same, this is not possible.
If you have multiple images where the objects always look like that you could use template matching techniques.
Also the closing operation doesn't make much sense to me.
#For isolated or unconnected blobs: Try this (you can set noise_removal_threshold to whatever you like and make it relative to the largest contour for example or a nominal value like 100 or 25).
mask = np.zeros_like(img)
for contour in contours:
area = cv2.contourArea(contour)
if area > noise_removal_threshold:
cv2.fillPoly(mask, [contour], 255)
Removing small connected components by area is called area opening. OpenCV does not have this as a function, it can be implemented as shown in other answers. But most other image processing packages will have an area opening function.
For example using scikit-image:
import skimage
import imageio.v3 as iio
img = iio.imread('cQMZm.png')[:,:,0]
out = skimage.morphology.area_opening(img, area_threshold=150, connectivity=2)
For example using DIPlib:
import diplib as dip
out = dip.AreaOpening(img, filterSize=150, connectivity=2)
PS: The DIPlib implementation is noticeably faster. Disclaimer: I'm an author of DIPlib.