i have this image:
I am interested to do segmentation only in the objects that appear in the image so i did something like this
import numpy as np
import cv2
from sklearn.cluster import MeanShift, estimate_bandwidth
#from skimage.color import rgb2lab
#Loading original image
originImg = cv2.imread('test/2019_00254.jpg')
# Shape of original image
originShape = originImg.shape
# Converting image into array of dimension [nb of pixels in originImage, 3]
# based on r g b intensities
flatImg=np.reshape(originImg, [-1, 3])
# Estimate bandwidth for meanshift algorithm
bandwidth = estimate_bandwidth(flatImg, quantile=0.1, n_samples=100)
ms = MeanShift(bandwidth = bandwidth, bin_seeding=True)
# Performing meanshift on flatImg
ms.fit(flatImg)
# (r,g,b) vectors corresponding to the different clusters after meanshift
labels=ms.labels_
# Remaining colors after meanshift
cluster_centers = ms.cluster_centers_
# Finding and diplaying the number of clusters
labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)
print("number of estimated clusters : %d" % n_clusters_)
segmentedImg = cluster_centers[np.reshape(labels, originShape[:2])]
cv2.imshow('Image',segmentedImg.astype(np.uint8))
cv2.waitKey(0)
cv2.destroyAllWindows()
but the problem is its doing segmentation in the whole image including the background so how can i do segmentation on the objects only note that i have bboxes coordinates of each object
I'd suggest you use a more straightforward input to understand (and feel) all the limitations behind the approach. The input you have is complex in terms of resolution, colors, scene complexity, object complexity, etc.
Anyway, to make this answer useful, let's do some experiments:
Detectron2, PointRend segmentation
Just in case you expect a complex model to handle the scene properly. Segmentation:
Masks:
No miracle here. The scene and objects are complex.
Monocular depth estimation
Let's try depth estimation as an obvious way to get rid of the background.
Depth (also check this example):
Result:
Part of the background is gone, but nothing to do with other objects.
Long story short, start with something simple to see the exact way your solution works.
BTW, it is always hard to work with thin and delicate details, so it is better to avoid that complexity if possible.
Related
I'm having a hard time to make this work.
My image set is made of small images (58x65).
I'm using ORB with the following parameters:
# Initiate ORB detector
# default: ORB(int nfeatures=500, float scaleFactor=1.2f, int nlevels=8, int edgeThreshold=31, int firstLevel=0, int WTA_K=2, int scoreType=ORB::HARRIS_SCORE, int patchSize=31)
orb = cv2.ORB_create(
nfeatures = 500, # The maximum number of features to retain.
scaleFactor = 1.2, # Pyramid decimation ratio, greater than 1
nlevels = 8, # The number of pyramid levels.
edgeThreshold = 7, # This is size of the border where the features are not detected. It should roughly match the patchSize parameter
firstLevel = 0, # It should be 0 in the current implementation.
WTA_K = 2, # The number of points that produce each element of the oriented BRIEF descriptor.
scoreType = cv2.ORB_HARRIS_SCORE, # The default HARRIS_SCORE means that Harris algorithm is used to rank features (the score is written to KeyPoint::score and is
# used to retain best nfeatures features); FAST_SCORE is alternative value of the parameter that produces slightly less stable
# keypoints, but it is a little faster to compute.
#scoreType = cv2.ORB_FAST_SCORE,
patchSize = 7 # size of the patch used by the oriented BRIEF descriptor. Of course, on smaller pyramid layers the perceived image area covered
# by a feature will be larger.
)
As it can be seen I changed edgeThreshold and patchSize parameters, but I'm afraid these sizes are too small to find meaningful features.
I am testing with a pretty big set of parking lot images (~3900 images of 58x65), both empty and occupied.
But the results are not consistent: an image of a car parked (from out of the set) is shown as closer to empty spaces than other cars parked.
What could I have been doing wrong? My guess is the above mentioned parameters. Someone with more experience on the subject could confirm it?
Edit:
Here is a small subset of the images.
Full dataset can be found here.
ORB and small images are not typically seen together because of the window size of the detector and number of scales. Your window size is 7 x 7, the number of scales you chose is 8 with a scaling factor of 1.2. This is around the typical settings for the detector but if you do the math, you'll quickly realize that as you go down further scales the window size will be too large, prompting very few detections if any. I would not recommend you use ORB here.
Try using a dense feature descriptor, such as HOG or Dense SIFT which provides a feature descriptor for overlapping windows of pixels regardless of their composition. Judging from the images that you've described, this sounds like a better approach.
Assuming you have a grayscale image called im, for HOG:
import cv2
sample = ... # Path to image here
# Create HOG Descriptor object
hog = cv2.HOGDescriptor()
im = cv2.imread(sample, 0) # Grayscale image
# Compute HOG descriptor
h = hog.compute(im)
For Dense SIFT:
import cv2
sample = ... # Path to image here
im = cv2.imread(sample, 0) # Grayscale image
# Create SIFT object
sift = cv2.xfeatures2d.SIFT_create()
# Provide a list of keypoints in spaces of 5 pixels horizontally and vertically
# Change the step size according to what you want
step_size = 5
kp = [cv2.KeyPoint(x, y, step_size) for y in range(0, img.shape[0], step_size)
for x in range(0, img.shape[1], step_size)]
# Calculate Dense SIFT feature vector
dense_feat = sift.compute(img, kp)
Note that for the SIFT descriptor, you will need to install the opencv-contrib-python flavour of the library (i.e. pip install opencv-contrib-python).
I run the SLIC (Simple Linear Iterative Clustering) superpixels algorithm from opencv and skimage on the same picture with, but got different results, the skimage slic result is better, Shown in the picture below.First one is opencv SLIC, the second one is skimage SLIC. I got several questions hope someonc can help.
Why opencv have the parameter 'region_size' while skimage is 'n_segments'?
Is convert to LAB and a guassian blur necessary?
Is there any trick to optimize the opecv SLIC result?
===================================
OpenCV SLIC
Skimage SLIC
# Opencv
src = cv2.imread('pic.jpg') #read image
# gaussian blur
src = cv2.GaussianBlur(src,(5,5),0)
# Convert to LAB
src_lab = cv.cvtColor(src,cv.COLOR_BGR2LAB) # convert to LAB
# SLIC
cv_slic = ximg.createSuperpixelSLIC(src_lab,algorithm = ximg.SLICO,
region_size = 32)
cv_slic.iterate()
# Skimage
src = io.imread('pic.jpg')
sk_slic = skimage.segmentation.slic(src,n_segments = 256, sigma = 5)
Image with superpixels centroid generated with the code below
# Measure properties of labeled image regions
regions = regionprops(labels)
# Scatter centroid of each superpixel
plt.scatter([x.centroid[1] for x in regions], [y.centroid[0] for y in regions],c = 'red')
but there is one superpixel less(top-left corner), and I found that
len(regions) is 64 while len(np.unique(labels)) is 65 , why?
I'm not sure why you think skimage slic is better (and I maintain skimage! 😂), but:
different parameterizations are common in mathematics and computer science. Whether you use region size or number of segments, you should get the same result. I expect the formula to convert between the two will be something like n_segments = image.size / region_size.
The original paper suggests that for natural images (meaning images of the real world like you showed, rather than e.g. images from a microscope or from astronomy), converting to Lab gives better results.
to me, based on your results, it looks like the gaussian blur used for scikit-image was higher than for openCV. So you could make the results more similar by playing with the sigma. I also think the compactness parameter is probably not identical between the two.
I have 10 greyscale brain MRI scans from BrainWeb. They are stored as a 4d numpy array, brains, with shape (10, 181, 217, 181). Each of the 10 brains is made up of 181 slices along the z-plane (going through the top of the head to the neck) where each slice is 181 pixels by 217 pixels in the x (ear to ear) and y (eyes to back of head) planes respectively.
All of the brains are type dtype('float64'). The maximum pixel intensity across all brains is ~1328 and the minimum is ~0. For example, for the first brain, I calculate this by brains[0].max() giving 1328.338086605072 and brains[0].min() giving 0.0003886114541273855. Below is a plot of a slice of a brain[0]:
I want to binarize all these brain images by rescaling the pixel intensities from [0, 1328] to {0, 1}. Is my method correct?
I do this by first normalising the pixel intensities to [0, 1]:
normalized_brains = brains/1328
And then by using the binomial distribution to binarize each pixel:
binarized_brains = np.random.binomial(1, (normalized_brains))
The plotted result looks correct:
A 0 pixel intensity represents black (background) and 1 pixel intensity represents white (brain).
I experimented by implementing another method to normalise an image from this post but it gave me just a black image. This is because np.finfo(np.float64) is 1.7976931348623157e+308, so the normalization step
normalized_brains = brains/1.7976931348623157e+308
just returned an array of zeros which in the binarizition step also led to an array of zeros.
Am I binarising my images using a correct method?
Your method of converting the image to a binary image basically amounts to random dithering, which is a poor method of creating the illusion of grey values on a binary medium. Old-fashioned print is a binary medium, they have fine-tuned the methods to represent grey-value photographs in print over centuries. This process is called halftoning, and is shaped in part by properties of ink on paper, that we do not have to deal with in binary images.
So what methods have people come up with outside of print? Ordered dithering (mostly Bayer matrix), and error diffusion dithering. Read more about dithering on Wikipedia. I wrote a blog post showing how to implement all of these methods in MATLAB some years ago.
I would recommend you use error diffusion dithering for your particular application. Here is some code in MATLAB (taken from my blog post liked above) for the Floyd-Steinberg algorithm, I hope that you can translate this to Python:
img = imread('https://i.stack.imgur.com/d5E9i.png');
img = img(:,:,1);
out = double(img);
sz = size(out);
for ii=1:sz(1)
for jj=1:sz(2)
old = out(ii,jj);
%new = 255*(old >= 128); % Original Floyd-Steinberg
new = 255*(old >= 128+(rand-0.5)*100); % Simple improvement
out(ii,jj) = new;
err = new-old;
if jj<sz(2)
% right
out(ii ,jj+1) = out(ii ,jj+1)-err*(7/16);
end
if ii<sz(1)
if jj<sz(2)
% right-down
out(ii+1,jj+1) = out(ii+1,jj+1)-err*(1/16);
end
% down
out(ii+1,jj ) = out(ii+1,jj )-err*(5/16);
if jj>1
% left-down
out(ii+1,jj-1) = out(ii+1,jj-1)-err*(3/16);
end
end
end
end
imshow(out)
Resampling the image before applying the dithering greatly improves the results:
img = imresize(img,4);
% (repeat code above)
imshow(out)
NOTE that the above process expects the input to be in the range [0,255]. It is easy to adapt to a different range, say [0,1328] or [0,1], but it is also easy to scale your images to the [0,255] range.
Have you tried a threshold on the image?
This is a common way to binarize images, rather than trying to apply a random binomial distribution. You could try something like:
binarized_brains = (brains > threshold_value).astype(int)
which returns an array of 0s and 1s according to whether the image value was less than or greater than your chosen threshold value.
You will have to experiment with the threshold value to find the best one for your images, but it does not need to be normalized first.
If this doesn't work well, you can also experiment with the thresholding options available in the skimage filters package.
IT is easy in OpenCV. as mentioned a very common way is defining a threshold, But your result looks like you are allocating random values to your intensities instead of thresholding it.
import cv2
im = cv2.imread('brain.png', cv2.CV_LOAD_IMAGE_GRAYSCALE)
(th, brain_bw) = cv2.threshold(imy, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
th = (DEFINE HERE)
im_bin = cv2.threshold(im, th, 255, cv
cv2.imwrite('binBrain.png', brain_bw)
brain
binBrain
I am trying to perform image segmentation using scikit mean shift algorithm. I use opencv to display the segmented image.
My problem is the following: I use the code as given in different examples, and when I display the image after segmentation, I get a black image. I was wondering if someone could see what my mistake is...
Thanks a lot for the help !
Here is my code:
import numpy as np
import cv2
from sklearn.cluster import MeanShift, estimate_bandwidth
#Loading original image
originImg = cv2.imread('Swimming_Pool.jpg')
# Shape of original image
originShape = originImg.shape
# Converting image into array of dimension [nb of pixels in originImage, 3]
# based on r g b intensities
flatImg=np.reshape(originImg, [-1, 3])
# Estimate bandwidth for meanshift algorithm
bandwidth = estimate_bandwidth(flatImg, quantile=0.1, n_samples=100)
ms = MeanShift(bandwidth = bandwidth, bin_seeding=True)
# Performing meanshift on flatImg
ms.fit(flatImg)
# (r,g,b) vectors corresponding to the different clusters after meanshift
labels=ms.labels_
# Remaining colors after meanshift
cluster_centers = ms.cluster_centers_
# Finding and diplaying the number of clusters
labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)
print("number of estimated clusters : %d" % n_clusters_)
# Displaying segmented image
segmentedImg = np.reshape(labels, originShape[:2])
cv2.imshow('Image',segmentedImg)
cv2.waitKey(0)
cv2.destroyAllWindows()
For Displaying the image, the correct code would be
segmentedImg = cluster_centers[np.reshape(labels, originShape[:2])]
cv2.imshow('Image',segmentedImg.astype(np.uint8)
cv2.waitKey(0)
cv2.destroyAllWindows()
I tried your method of segmentation on a random sample photo, and the segmentation looked bad, probably because since your mean-shift is working only on the color space, it looses the locality info. The python package skimage comes with a segmentation module, and it offers a few super-pixel segmentation methods. The quickshift method is based on the 'mode seeking' mechanism that meanshift is based on. None of these methods would segment out an entire object in an image. They provide extremely localized segmentation.
You can convert to some other color-space (e.g., Lab colorspace, using the following code) and segment on the colors (discarding intensity).
from skimage.color import rgb2lab
image = rgb2lab(image)
Then use your above code to tune the parameters (quantile and n_samples) of the function estimate_bandwidth() and finally use matplotlib's subplot to plot the segmented image as shown below:
plt.figure()
plt.subplot(121), plt.imshow(image), plt.axis('off'), plt.title('original image', size=20)
plt.subplot(122), plt.imshow(np.reshape(labels, image.shape[:2])), plt.axis('off'), plt.title('segmented image with Meanshift', size=20)
plt.show()
to get the following output with the pepper image.
The issue is that you are trying to display labels, you should use label map to convert image into superpixels.
import numpy as np
import cv2
from sklearn.cluster import MeanShift, estimate_bandwidth
#Loading original image
originImg = cv2.imread('Swimming_Pool.jpg')
# Shape of original image
originShape = originImg.shape
# Converting image into array of dimension [nb of pixels in originImage, 3]
# based on r g b intensities
flatImg=np.reshape(originImg, [-1, 3])
# Estimate bandwidth for meanshift algorithm
bandwidth = estimate_bandwidth(flatImg, quantile=0.1, n_samples=100)
ms = MeanShift(bandwidth = bandwidth, bin_seeding=True)
# Performing meanshift on flatImg
ms.fit(flatImg)
# (r,g,b) vectors corresponding to the different clusters after meanshift
labels=ms.labels_
# Remaining colors after meanshift
cluster_centers = ms.cluster_centers_
# Finding and diplaying the number of clusters
labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)
print("number of estimated clusters : %d" % n_clusters_)
# Displaying segmented image
segmentedImg = np.reshape(labels, originShape[:2])
superpixels=label2rgb(segmentedImg,originImg,kind="'avg'")
cv2.imshow('Image',superpixels)
cv2.waitKey(0)
cv2.destroyAllWindows()
I am developing an image classifier using svm.In the feature extraction phase can i use pca as feature.How to find the pca of an image using python and opencv.what my plan is
Find pca of each image in training set and store it in a array.It may be list of lists
Store class labels in another list
pass this as argument to svm
Am i going in right Direction.Please help me
Yes you can do PCA+SVM, some might argue that PCA is not the best feature to use or SVM is not the best classification algorithm. But hey, have a good start is better than sitting around.
To do PCA with OpenCV, try something like (I haven't verified the codes, just to get you an idea):
import os
import cv2
import numpy as np
# Construct the input matrix
in_matrix = None
for f in os.listdir('dirpath'):
# Read the image in as a gray level image. Some modifications
# of the codes are needed if you want to read it in as a color
# image. For simplicity, let's use gray level images for now.
im = cv2.imread(os.path.join('dirpath', f), cv2.IMREAD_GRAYSCALE)
# Assume your images are all the same size, width w, and height h.
# If not, let's resize them to w * h first with cv2.resize(..)
vec = im.reshape(w * h)
# stack them up to form the matrix
try:
in_matrix = np.vstack((in_matrix, vec))
except:
in_matrix = vec
# PCA
if in_matrix is not None:
mean, eigenvectors = cv2.PCACompute(in_matrix, np.mean(in_matrix, axis=0).reshape(1,-1))