OpenCV, Python: How to use mask parameter in ORB feature detector - python

By reading a few answers on stackoverflow, I've learned this much so far:
The mask has to be a numpy array (which has the same shape as the image) with data type CV_8UC1 and have values from 0 to 255.
What is the meaning of these numbers, though? Is it that any pixels with a corresponding mask value of zero will be ignored in the detection process and any pixels with a mask value of 255 will be used? What about the values in between?
Also, how do I initialize a numpy array with data type CV_8UC1 in python? Can I just use dtype=cv2.CV_8UC1
Here is the code I am using currently, based on the assumptions I'm making above. But the issue is that I don't get any keypoints when I run detectAndCompute for either image. I have a feeling it might be because the mask isn't the correct data type. If I'm right about that, how do I correct it?
# convert images to grayscale
base_gray = cv2.cvtColor(self.base, cv2.COLOR_BGRA2GRAY)
curr_gray = cv2.cvtColor(self.curr, cv2.COLOR_BGRA2GRAY)
# initialize feature detector
detector = cv2.ORB_create()
# create a mask using the alpha channel of the original image--don't
# use transparent or partially transparent parts
base_cond = self.base[:,:,3] == 255
base_mask = np.array(np.where(base_cond, 255, 0))
curr_cond = self.base[:,:,3] == 255
curr_mask = np.array(np.where(curr_cond, 255, 0), dtype=np.uint8)
# use the mask and grayscale images to detect good features
base_keys, base_desc = detector.detectAndCompute(base_gray, mask=base_mask)
curr_keys, curr_desc = detector.detectAndCompute(curr_gray, mask=curr_mask)
print("base keys: ", base_keys)
# []
print("curr keys: ", curr_keys)
# []

So here is most, if not all, of the answer:
What is the meaning of those numbers
0 means to ignore the pixel and 255 means to use it. I'm still unclear on the values in between, but I don't think all nonzero values are considered "equivalent" to 255 in the mask. See here.
Also, how do I initialize a numpy array with data type CV_8UC1 in python?
The type CV_8U is the unsigned 8-bit integer, which, using numpy, is numpy.uint8. The C1 postfix means that the array is 1-channel, instead of 3-channel for color images and 4-channel for rgba images. So, to create a 1-channel array of unsigned 8-bit integers:
import numpy as np
np.zeros((480, 720), dtype=np.uint8)
(a three-channel array would have shape (480, 720, 3), four-channel (480, 720, 4), etc.) This mask would cause the detector and extractor to ignore the entire image, though, since it's all zeros.
how do I correct [the code]?
There were two separate issues, each separately causing each keypoint array to be empty.
First, I forgot to set the type for the base_mask
base_mask = np.array(np.where(base_cond, 255, 0)) # wrong
base_mask = np.array(np.where(base_cond, 255, 0), dtype=uint8) # right
Second, I used the wrong image to generate my curr_cond array:
curr_cond = self.base[:,:,3] == 255 # wrong
curr_cond = self.curr[:,:,3] == 255 # right
Some pretty dumb mistakes.
Here is the full corrected code:
# convert images to grayscale
base_gray = cv2.cvtColor(self.base, cv2.COLOR_BGRA2GRAY)
curr_gray = cv2.cvtColor(self.curr, cv2.COLOR_BGRA2GRAY)
# initialize feature detector
detector = cv2.ORB_create()
# create a mask using the alpha channel of the original image--don't
# use transparent or partially transparent parts
base_cond = self.base[:,:,3] == 255
base_mask = np.array(np.where(base_cond, 255, 0), dtype=np.uint8)
curr_cond = self.curr[:,:,3] == 255
curr_mask = np.array(np.where(curr_cond, 255, 0), dtype=np.uint8)
# use the mask and grayscale images to detect good features
base_keys, base_desc = detector.detectAndCompute(base_gray, mask=base_mask)
curr_keys, curr_desc = detector.detectAndCompute(curr_gray, mask=curr_mask)
TL;DR: The mask parameter is a 1-channel numpy array with the same shape as the grayscale image in which you are trying to find features (if image shape is (480, 720), so is mask).
The values in the array are of type np.uint8, 255 means "use this pixel" and 0 means "don't"
Thanks to Dan Mašek for leading me to parts of this answer.

Related

How to properly insert mask into original image?

My goal is to cover a face with circular noise (salt and pepper/black and white dots), however what I have managed to achieve is just rectangular noise.
Using this image:
I found the face coordinates (x,y,w,h) = [389, 127, 209, 209]
And using this function added noise Like:
img = cv2.imread('like_this.jpg')
x,y,w,h = [389, 127, 209, 209]
noised = add_noise(img[y:y+h,x:x+w])
new = img.copy()
new[y:y+h,x:x+w] = noised
cv2.imshow('new', new)
From x,y,w,h I found that I want my circle to be at (493, 231) with radius 105
I researched I found something about masking and bitwise operations, so I tried:
mask = np.zeros(new.shape[:2], dtype='uint8')
cv2.circle(mask, (493, 231), 105, 255, -1)
new_gray = cv2.cvtColor(new, cv2.COLOR_BGR2GRAY)
masked = cv2.bitwise_and(new_gray, new_gray , mask=mask)
cv2.imshow('masked', masked)
Here, the problem arises:
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
OR = cv2.bitwise_or(masked, img_gray) # removes the black dots, idk why
cv2.imshow('bitwise - OR', OR)
The black dots get removed from the noise and besides that, I can't seem to convert OR back to BGR.
Maybe there is a better way to do that.
Please, help/guidance needed!
So the issue is how to use masking. There are two options, numpy and OpenCV.
numpy
Since you copied the noisy area into the result and now want to restore everything outside of the circle, I'll use mask == 0 to get a boolean array that is true everywhere outside of the circle.
With numpy, boolean arrays can be used as indices. The result is a "view", it behaves like a slice. Operations through it affect the original data.
noised = add_noise(img[y:y+h,x:x+w])
new = img.copy()
new[y:y+h,x:x+w] = noised # whole rectangle affected
new[mask == 0] = img[mask == 0] # restore everything outside of the circle
All three arrays (mask, new, img) need to have the same shape.
OpenCV, masks
Not much point to it with Python and numpy available, but many of its C++ APIs take an optional mask argument that modifies that function's operation. Mat::copyTo() is one such method.
OpenCV, bitwise operations
With bitwise operations, the mask would no longer just label each pixel as true or false, but it would have to be 3-channels and all eight bits of every value count, so it must contain only 0 and 255 (0xFF).
I'll erase everything outside of the circle first, then add back the part of the source that is outside of the circle. Both operations use bitwise_and. Two operations are required because bitwise operations can't just "overwrite". They react to both operands. I'll also use numpy's ~ operator to bitwise-negate the mask.
bitwise_mask = cv.cvtColor(mask, cv.COLOR_GRAY2BGR) # blow it up
new = cv.bitwise_and(new, bitwise_mask)
new += cv.bitwise_and(img, ~bitwise_mask)
Bitwise opperations function on binary conditions turning "off" a pixel if it has a value of zero, and turning it "on" if the pixel has a value greater than zero.
In your case the bitwise_or "removes" both the black background of the mask and the black points of the noise.
I propose to do it like this:
# Read image
img = cv2.imread('like_this.jpg')
x,y,w,h = [389, 127, 209, 209]
# Add squared noise
noised = add_noise(img[y:y+h,x:x+w])
new = img.copy()
new[y:y+h,x:x+w] = noised
# Transform the image to graylevel
new_gray = cv2.cvtColor(new, cv2.COLOR_BGR2GRAY)
# Create a mask in which black noise is equal to 0 and white to 2
mask = np.zeros(new.shape[:2], dtype='uint8')
mask[new_gray==0] = 1
mask[new_gray==255] = 2
# Mask the previous mask with a circle
circle = np.zeros(new.shape[:2], dtype='uint8')
circle = cv2.circle(circle, (493, 231), 105, 1, -1)
mask = mask * circle
plt.imshow(mask)
plt.show()
# Join the mask adding the noise to the image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray[mask==1] = 0
gray[mask==2] = 255
cv2.imshow('Result', gray)
cv2.waitKey(0)
Hope it works. If you need more help let me know.

How to find most frequent pixels values of a set of RGB images, excluding black pixels?

I have a large set of also large images (5000,10000,3 channels, RGB) from a semantic segmentation process. I am trying to create a new image with the most "common" value for each pixel, the mode of each pixel for the complete set. Those images have some particularities. First of all, they have the same size, but sometimes contains black pixels that represent no information and must be excluded from the mode calculation. Merging together all image set, I will be able to define which pixel colour tuple (r,g,b) is the most common and store this information as a new image without black pixels.
I have tried using scipy stats.mode to analyse a list of np.array from the images, but this method does not count the (0,0,0) tuple as a nan_policy='omit', so after the calculation, it returns a black image. (0,0,0) is the most frequent pixel colour after all.
I tried also replacing the (0,0,0) tuple by a 'nan' value but the ram usage goes up really fast and is not efficient.
Could anyone give me a hint of some vectorised method to implement this stat calculation?
Thanks!
some sample images: img1img2img3img4
It sounds like you stored mixed tuples and nan values in a numpy array. This is not very effficient, because that would be an object array that needs to handle memory allocation separately for each pixel.
It is better to convert the each RGB tuple to a (integer) floating-point value. A single-precision float can store integers up to 2**24-1 without loss of precision; that is just enough for storing 24-bit RGB values.
Here is how to do it with 5 images of 50x100 pixels.
from scipy.stats import mode as stats_mode
ny, nx = 50, 100
imgs = np.random.randint(255, size=(5, ny, nx, 3), dtype=np.uint8)
imgs[:3, ny//2, nx//2, :] = 0 # ignore thsee
imgs[3:, ny//2, nx//2, :] = [255, 255, 254] # find this
my = 10 # slice size - must divide ny
mode_img = np.zeros((ny, nx, 3), dtype=np.uint8)
flt_imgs = np.zeros((5, my, nx), dtype=np.float32)
for iy in range(0, ny, my):
yslice = slice(iy, iy+my)
flt_imgs[:] = imgs[:, yslice, :, 0]*(256*256)
flt_imgs += imgs[:, yslice, :, 1]*256
flt_imgs += imgs[:, yslice, :, 2]
flt_imgs[flt_imgs == 0] = np.nan
mode_result = stats_mode(flt_imgs, axis=0, nan_policy='omit')
imode = mode_result.mode[0].astype(np.int32)
mode_img[yslice, :, 0] = (imode >> 16) & 0xff
mode_img[yslice, :, 1] = (imode >> 8) & 0xff
mode_img[yslice, :, 2] = imode & 0xff
print(f'Found mode: {mode_img[ny//2, nx//2]}')
Output:
Found mode: [255 255 254]

Cropping image from a binary mask

I am trying to use DeepLab v3 to detect object and mask where the actual object is.
DeepLab model produces a resized_im(3D) and a mask seg_map (2D) of 0 and non-zero values, 0 means it's the background.
Currently, it is only possible to plot an image with an overlay mask on the object. I want to crop the object out of the resized_im with transparent background. Is there any advice for the work?
You can play around with the notebook here:
https://colab.research.google.com/drive/138dTpcYfne40hqrb13n_36okSGYhrJnz?usp=sharing&hl=en#scrollTo=p47cYGGOQE1W&forceEdit=true&sandboxMode=true
I also tried here: How to crop image based on binary mask but none seems to work on my case
You just need to convert your segmentation mask to boolean numpy array, then multiply image by it. Don't forget that your image has 3 channels while mask has only 1. It may look something like that:
# seg_map - segmentation mask from network, resized_im - your input image
mask = np.greater(seg_map, 0) # get only non-zero positive pixels/labels
mask = np.expand_dims(mask, axis=-1) # (H, W) -> (H, W, 1)
mask = np.concatenate((mask, mask, mask), axis=-1) # (H, W, 1) -> (H, W, 3), (don't like it, so if you know how to do it better, please let me know)
crops = resized_im * mask # apply mask on image
You can use different logical numpy function if you want to choose certain labels, for example:
mask = np.equal(seg_map, 5) # to get only objects with label 5

How to check and convert image data types in OpenCV using Python

I'm using various methods in OpenCV to preprocess some images. I often get errors relating to the data type when passing objects been methods eg:
import cv2
import numpy as np
#import image and select ROI
image1 = cv2.imread('image.png')
roi_1 = cv2.selectROI(image1) # spacebar to confirm selection
cv2.waitKey(0)
cv2.destroyAllWindows()
# preprocesssing
imCrop_1 = image1[int(roi_1[1]):int(roi_1[1]+roi_1[3]), int(roi_1[0]):int(roi_1[0]+roi_1[2])]
grey1 = cv2.cvtColor(imCrop_1, cv2.COLOR_RGB2GRAY)
thresh, bw_1 = cv2.threshold(grey1, 200, 255, cv2.THRESH_OTSU)
canny_edge1 = cv2.Canny(bw_1, 50, 100)
#test=roi_1 # Doesn't work with error: /home/bprodz/opencv/modules/photo/src/denoising.cpp:182: error: (-5) Type of input image should be CV_8UC3 or CV_8UC4! in function fastNlMeansDenoisingColored
#test = imCrop_1 # works
#test = grey1 # doesn't work with error above
#test = bw_1 # doesn't work with error above
#test = canny_edge1 # doesn't work with error above
dst = cv2.fastNlMeansDenoisingColored(test,None,10,10,7,21)
# Check object types
type(imCrop_1) # returns numpy.ndarray - would like to see ~ CV_8UC3 etc.
type(grey1) # returns numpy.ndarray
Presently I just use trial and error, is a there a more methodical approach that I can use for checking and converting between different object types?
You are probably using wrong method, for this purpose, you may also get a hint from the method name you are using, as per documentation of cv2. fastNlMeansDenoisingColored:
src – Input 8-bit 3-channel image.
dst – Output image with the same size and type as src .
So, if want to use cv2. fastNlMeansDenoisingColored then you need to convert the src mat to a 3-channel matrix which can be done as:
cv2.cvtColor(src, cv2.COLOR_GRAY2BGR)
But if you have gray-scale image, then you may use cv2.fastNlMeansDenoising, which accepts the both single channel and three channel as source mats, and save the step for converting the image.
You can also use img.shape to check the number of channels for a given matrix. it would return (100, 100) for a gray-scale matrix, (100, 100, 3) for 3-channel matrix and (100, 100, 4) for 4-channel matrix. You can also get the type of matrix using img.dtype

explain arguments meaning in res = cv2.bitwise_and(img,img,mask = mask)

I am trying to extract blue colour of an input image. For that I create a blue HSV colour boundary and threshold HSV image by using the command
mask_img = cv2.inRange(hsv, lower_blue, upper_blue)
After that I used a bitwise_and on the input image and the threshold image by using
res = cv2.bitwise_and(img, img, mask = mask_img)
Where img is the input image. I got this code from opencv. But I didn't understand why are three arguments used in bitwise_and and what actually each arguments mean? Why the same image is used at src1 and src2 ?
And also what is the use of mask keyword here? Please help me to find out the answer
The basic concept behind this is the value of color black ,it's value is 0 in OPEN_CV.So black + anycolor= anycolor because value of black is 0.
Now suppose we have two images one is named img1 and other is img2.
img2 contains a logo which we want to place on the img1. We create threshold and then the mask and mask_inv of img2,and also create roi of img1.
Now we have to do two things to add the logo of img2 on img1.
We create background of roi as img1_bg with help of : mask_inv,mask_inv will have two region one black and one white, in the white region we will put img1 part and leave black as it is-
img1_bg = cv2.bitwise_and(roi,roi,mask = mask_inv)
In your question you have used directly the mask of the img created
res = cv2.bitwise_and(img,img,mask = mask_img)
and in img2 we need to create the logo as foreground of roi ,
img2_fg = cv2.bitwise_and(img2,img2,mask = mask)
here we have used mask layer , the logo part of img2 gets filled in the white part of mask
Now when we add both we get a perfect combined roi
For full description and understanding visit:
OPEN CV CODE FILES AND FULL DESCRIPTION
The operation of "And" will be performed only if mask[i] doesn't equal zero, else the the result of and operation will be zero. The mask should be either white or black image with single channel. you can see this link
http://docs.opencv.org/2.4.13.2/modules/core/doc/operations_on_arrays.html?highlight=bitwise#bitwise-and
what is actually each arguments mean?
res = cv2.bitwise_and(img,img,mask = mask_img)
src1: the first image (the first object for merging)
src2: the second image (the second object for merging)
mask: understood as rules to merge. If region of image (which is gray-scaled, and then masked) has black color (valued as 0), then it is not combined (merging region of the first image with that of the second one), vice versa, it will be carried out. In your code, referenced image is "mask_img".
In my case, my code is correct, when it makes white + anycolor = anycolor
import cv2
import numpy as np
# Load two images
img1 = cv2.imread('bongSung.jpg')
img2 = cv2.imread('opencv.jpg')
# I want to put logo on top-left corner, so I create a ROI
rows, cols, channels = img2.shape
roi = img1[0:rows, 0:cols]
# NOw we need to create a mask of the logo, mask is conversion to grayscale of an image
img2gray = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(img2gray, 220, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('mask', mask)
mask_inv = cv2.bitwise_not(mask)
#cv2.imshow("mask_inv", mask_inv)
#When using bitwise_and() in opencv with python then white + anycolor = anycolor; black + anycolor = black
img1_bg = cv2.bitwise_and(roi,roi,mask = mask_inv)
#cv2.imshow("img1_bg", img1_bg)
cv2.imshow("img2", img2)
img2_fg = cv2.bitwise_and(img2,img2,mask = mask)
cv2.imshow('img2_fg', img2_fg)
dst = cv2.add(img1_bg,img2_fg)
img1[0:rows, 0:cols] = dst
#cv2.imshow("Image", img1)
cv2.waitKey(0)
cv2.destroyAllWindows()
From above answers we may know the definitions of the parameters of bitwise_and(), but they all do not answer the other question
Why the same image is used at src1 and src2 ?
This question should be caused by the too simplified function definition in the document of OpenCV, it may be ambiguous to some people, in the document the bitwise_and() is defined as
dst(I)=sur1(I) ^ sur2(I), if mask(I) != 0, where ^ represents the 'and' operator
from this definition at first sight I cannot get the picture about how to process the dst(I) when the mask(I) is 0.
From the test result, I think that it should give a more clear function definition as
dst(I)=sur1(I) ^sur2(I), if mask(I) != 0,
otherwise the dst(I) keep its original value and the default value of all elements of the dst array is 0.
Now we may know that using the same image for sur1 and sur2, it will only keep the original image parts in the area of mask(I) !=0 and the other area shows the part of the dst image (as the mask shape)
Additionally for other bitwise operations the definitions should be the same as above, they also need to add the otherwise condition and the default value description of the dst array
The below link explains clearly the bitwise operation and also the significance of each parameters.
http://opencvexamples.blogspot.com/2013/10/bitwise-and-or-xor-and-not.html
void bitwise_and(InputArray src1, InputArray src2, OutputArray dst, InputArray mask=noArray())
Calculates the per-element bit-wise conjunction of two arrays or an array and a scalar.
Parameters:
src1 – first input array or a scalar.
src2 – second input array or a scalar.
src – single input array.
value – scalar value.
dst – output array that has the same size and type as the input arrays.
mask – optional operation mask, 8-bit single channel array, that specifies elements of the output array to be changed
Regarding using img twice, my guess is that we don't really care what img[i] and img[i] is, as it's just img[i] for binary. What matters is that, as mentioned by Mohammed Awney, when the mask is 0, we make img[i] be 0, otherwise we leave the pixel alone. This is a way to make certain pixels in img black, according to our mask.
bitwise_and ( InputArray src1,
InputArray src2,
OutputArray dst,
InputArray mask = noArray()
)
src1 first input array or a scalar.
src2 second input array or a scalar.
dst output array that has the same size and type as the input arrays.
mask optional operation mask, 8-bit single channel array, that specifies elements of the output array to be changed.
dst(I)=src1(I)∧src2(I)if mask(I)≠0
and mask operate on dst
computes bitwise conjunction of the two arrays (dst = src1 & src2) Calculates the per-element bit-wise conjunction of two arrays or an array and a scalar.

Categories