One of the biggest challenges in tesseract OCR text recognition is the uneven illumination of images.
I need an algorithm that can decide the image is containing uneven illuminations or not.
Test Images
I Attached the images of no illumination image, glare image( white-spotted image) and shadow containing image.
If we give an image to the algorithm, the algorithm should divide into two class like
No uneven illumination - our no illumination image will fall into this category.
Uneven illumination - Our glare image( white-spotted image), shadow containing image will fall in this category.
No Illumination Image - Category A
UnEven Illumination Image (glare image( white-spotted image)) Category B
Uneven Illumination Image (shadow containing an image) Category B
Initial Approach
Change colour space to HSV
Histogram analysis of the value channel of HSV to identify the uneven illumination.
Instead of the first two steps, we can use the perceived brightness
channel instead of the value channel of HSV
Set a low threshold value to get the number of pixels which are less than the low threshold
Set a high threshold value to get the number of pixels which are higher than the high threshold
percentage of low pixels values and percentage of high pixel values to detect uneven lightning condition (The setting threshold for percentage as well )
But I could not find big similarities between uneven illumination
images. I just found there are some pixels that have low value and
some pixels have high value with histogram analysis.
Basically what I feel is if setting some threshold values in the low and to find how many pixels are less than the low threshold and setting some high threshold value to find how many pixels are greater than that threshold. with the pixels counts can we come to a conclusion to detect uneven lightning conditions in images? Here we need to finalize two threshold values and the percentage of the number of pixels to come to the conclusion.
def show_hist_v(img_path):
img = cv2.imread(img_path)
hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
h,s,v = cv2.split(hsv_img)
histr =cv2.calcHist(v, [0], None, [255],[0,255])
plt.plot(histr)
plt.show()
low_threshold =np.count_nonzero(v < 50)
high_threshold =np.count_nonzero(v >200)
total_pixels = img.shape[0]* img.shape[1]
percenet_low =low_threshold/total_pixels*100
percenet_high =high_threshold/total_pixels*100
print("Total Pixels - {}\n Pixels More than 200 - {} \n Pixels Less than 50 - {} \n Pixels percentage more than 200 - {} \n Pixel spercentage less than 50 - {} \n".format(total_pixels,high_threshold,low_threshold,percenet_low,percenet_high))
return total_pixels,high_threshold,low_threshold,percenet_low,percenet_high
So can someone improve my initial approach or give better than this approach to detect uneven illumination in images for general cases?
Also, I tried perceived brightness instead of the value channel since the value channel takes the maximum of (b,g,r) values the perceive brightness is a good choice as I think
def get_perceive_brightness( float_img):
float_img = np.float64(float_img) # unit8 will make overflow
b, g, r = cv2.split(float_img)
float_brightness = np.sqrt(
(0.241 * (r ** 2)) + (0.691 * (g ** 2)) + (0.068 * (b ** 2)))
brightness_channel = np.uint8(np.absolute(float_brightness))
return brightness_channel
def show_hist_v(img_path):
img = cv2.imread(img_path)
v = get_perceive_brightness(img)
histr =cv2.calcHist(v, [0], None, [255],[0,255])
plt.plot(histr)
plt.show()
low_threshold =np.count_nonzero(v < 50)
high_threshold =np.count_nonzero(v >200)
total_pixels = img.shape[0]* img.shape[1]
percenet_low =low_threshold/total_pixels*100
percenet_high =high_threshold/total_pixels*100
print("Total Pixels - {}\n Pixels More than 200 - {} \n Pixels Less than 50 - {} \n Pixels percentage more than 200 - {} \n Pixel spercentage less than 50 - {} \n".format(total_pixels,high_threshold,low_threshold,percenet_low,percenet_high))
return total_pixels,high_threshold,low_threshold,percenet_low,percenet_high
Histogram analysis of perceived brightness channel
As Ahmet suggested.
def get_percentage_of_binary_pixels(img=None, img_path=None):
if img is None:
if img_path is not None:
gray_img = cv2.imread(img_path, 0)
else:
return "No img or img_path"
else:
print(img.shape)
if len(img.shape) > 2:
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
else:
gray_img = img
h, w = gray_img.shape
guassian_blur = cv2.GaussianBlur(gray_img, (5, 5), 0)
thresh_value, otsu_img = cv2.threshold(guassian_blur, 0, 255,
cv2.THRESH_BINARY + cv2.THRESH_OTSU)
cv2.imwrite("binary/{}".format(img_path.split('/')[-1]), otsu_img)
black_pixels = np.count_nonzero(otsu_img == 0)
# white_pixels = np.count_nonzero(otsu_img == 255)
black_pixels_percentage = black_pixels / (h * w) * 100
# white_pixels_percentage = white_pixels / (h * w) * 100
return black_pixels_percentage
when we get more than 35% of black_ pixels percentage with otsu binarization, we can detect the uneven illumination images around 80 percentage. When the illumination occurred in a small region of the image, the detection fails.
Thanks in advance
I suggest using the division trick to separate text from the background, and then calculate statistics on the background only. After setting some reasonable thresholds it is easy to create classifier for the illumination.
def get_image_stats(img_path, lbl):
img = cv2.imread(img_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (25, 25), 0)
no_text = gray * ((gray/blurred)>0.99) # select background only
no_text[no_text<10] = no_text[no_text>20].mean() # convert black pixels to mean value
no_bright = no_text.copy()
no_bright[no_bright>220] = no_bright[no_bright<220].mean() # disregard bright pixels
print(lbl)
std = no_bright.std()
print('STD:', std)
bright = (no_text>220).sum()
print('Brigth pixels:', bright)
plt.figure()
plt.hist(no_text.reshape(-1,1), 25)
plt.title(lbl)
if std>25:
print("!!! Detected uneven illumination")
if no_text.mean()<200 and bright>8000:
print("!!! Detected glare")
This results in:
good_img
STD: 11.264569863071165
Brigth pixels: 58
glare_img
STD: 15.00149131296984
Brigth pixels: 15122
!!! Detected glare
uneven_img
STD: 57.99510339944441
Brigth pixels: 688
!!! Detected uneven illumination
Now let's analyze the histograms and apply some common sense. We expect background to be even and have low variance, like it is the case in "good_img". If it has high variance, then its standard deviation would be high and it is the case of uneven brightness. On the lower image you can see 3 (smaller) peaks that are responsible for the 3 different illuminated areas. The largest peak in the middle is the result of setting all black pixels to the mean value. I believe it is safe to call images with STD above 25 as "uneven illumination" case.
It is easy to spot a high amount of bright pixels when there is glare (see image on right). Glared image looks like a good image, besided the hot spot. Setting threshold of bright pixels to something like 8000 (1.5% of total image size) should be good to detect such images. There is a possibility that the background is very bright everywhere, so if the mean of no_text pixels is above 200, then it is the case and there is no need to detect hot spots.
Why don't you remove the lightning effect from the images?
For instance:
If we want to read with pytesseract output will be ' \n\f'
But if we remove the lightning:
import cv2
import pytesseract
img = cv2.imread('img2.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
smooth = cv2.GaussianBlur(gray, (95, 95), 0)
division = cv2.divide(gray, smooth, scale=192)
And read with the pytesseract, some part of the output will be:
.
.
.
Dosage & use
See package insert for compicic
information,
Instruction:
Keep all medicines out of the re.
Read the instructions carefully
Storage:
Store at temperature below 30°C.
Protect from Heat, light & moisture. BATCH NO. : 014C003
MFG. DATE - 03-2019
—— EXP. DATE : 03-2021
GENIX Distributed
AS Exclusi i :
genx PHARMA PRIVATE LIMITED Cevoka Pv 2 A ‘<
» 45-B, Kore ci
Karachi-75190, | Pakisier al Pei yaa fans
www.genixpharma.com
Repeat for the last image:
And read with the pytesseract, some part of the output will be:
.
.
.
Dosage & use
See package insert for complete prescribing
information. Rx Only
Instruction:
Keep all medicines out of the reach of children.
Read the instructions carefully before using.
Storage:
Store at temperature below 30°C. 5
Protect from Neat, light & moisture. BATCH NO, : 0140003
MFG. DATE : 03-2019
EXP. DATE : 03-2021
Manufactured by:
GENI N Exclusively Distributed by:
GENIX PHARMA PRIVATE LIMITED Ceyoka (Pvt) Ltd.
44, 45-B, Korangi Creek Road, 55, Negombe Road,
Karachi-75190, Pakistan. Peliyagoda, Snianka,
www. genixpharma.com
Update
You can find the illuminated part using erode and dilatation methods.
Result:
Code:
import cv2
import imutils
import numpy as np
from skimage import measure
from imutils import contours
img = cv2.imread('img2.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (95, 95), 0)
thresh = cv2.threshold(blurred, 200, 255, cv2.THRESH_BINARY)[1]
thresh = cv2.erode(thresh, None, iterations=2)
thresh = cv2.dilate(thresh, None, iterations=4)
labels = measure.label(thresh, neighbors=8, background=0)
mask = np.zeros(thresh.shape, dtype="uint8")
for label in np.unique(labels):
if label == 0:
continue
labelMask = np.zeros(thresh.shape, dtype="uint8")
labelMask[labels == label] = 255
numPixels = cv2.countNonZero(labelMask)
if numPixels > 300:
mask = cv2.add(mask, labelMask)
cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
cnts = contours.sort_contours(cnts)[0]
for (i, c) in enumerate(cnts):
(x, y, w, h) = cv2.boundingRect(c)
((cX, cY), radius) = cv2.minEnclosingCircle(c)
cv2.circle(img, (int(cX), int(cY)), int(radius),
(0, 0, 255), 3)
cv2.putText(img, "#{}".format(i + 1), (x, y - 15),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
cv2.imshow("Image", img)
cv2.waitKey(0)
Though I only tested with the second-image. You may need to change the parameters for the other images.
Here is a quick solution in ImageMagick. But it can easily be implemented in Python/OpenCV as shown further down.
Use division normalization.
Read the input
Optionally convert to grayscale
Copy the image and blur it
Divide the blurred image by the original
Save the results
Input:
convert 8W0bp.jpg \( +clone -blur 0x13 \) +swap -compose divide -composite x1.png
convert ob87W.jpg \( +clone -blur 0x13 \) +swap -compose divide -composite x2.png
convert HLJuA.jpg \( +clone -blur 0x13 \) +swap -compose divide -composite x3.png
Results:
In Python/OpenCV:
import cv2
import numpy as np
import skimage.filters as filters
# read the image
img = cv2.imread('8W0bp.jpg')
#img = cv2.imread('ob87W.jpg')
#img = cv2.imread('HLJuA.jpg')
# convert to gray
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# blur
smooth = cv2.GaussianBlur(gray, (33,33), 0)
# divide gray by morphology image
division = cv2.divide(gray, smooth, scale=255)
# sharpen using unsharp masking
sharp = filters.unsharp_mask(division, radius=1.5, amount=2.5, multichannel=False, preserve_range=False)
sharp = (255*sharp).clip(0,255).astype(np.uint8)
# save results
cv2.imwrite('8W0bp_division.jpg',division)
cv2.imwrite('8W0bp_division_sharp.jpg',sharp)
#cv2.imwrite('ob87W_division.jpg',division)
#cv2.imwrite('ob87W_division_sharp.jpg',sharp)
#cv2.imwrite('HLJuA_division.jpg',division)
#cv2.imwrite('HLJuA_division_sharp.jpg',sharp)
# show results
cv2.imshow('smooth', smooth)
cv2.imshow('division', division)
cv2.imshow('sharp', sharp)
cv2.waitKey(0)
cv2.destroyAllWindows()
Results:
Here my pipeline:
%matplotlib inline
import numpy as np
import cv2
from matplotlib import pyplot as plt
from scipy.signal import find_peaks
I use the functions:
def get_perceived_brightness( float_img):
float_img = np.float64(float_img) # unit8 will make overflow
b, g, r = cv2.split(float_img)
float_brightness = np.sqrt((0.241 * (r ** 2)) + (0.691 * (g ** 2)) + (0.068 * (b ** 2)))
brightness_channel = np.uint8(np.absolute(float_brightness))
return brightness_channel
# from: https://stackoverflow.com/questions/46300577/find-locale-minimum-in-histogram-1d-array-python
def smooth(x,window_len=11,window='hanning'):
if x.ndim != 1:
raise ValueError("smooth only accepts 1 dimension arrays.")
if x.size < window_len:
raise ValueError("Input vector needs to be bigger than window size.")
if window_len<3:
return x
if not window in ['flat', 'hanning', 'hamming', 'bartlett', 'blackman']:
raise ValueError("Window is on of 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'")
s=np.r_[x[window_len-1:0:-1],x,x[-2:-window_len-1:-1]]
if window == 'flat': #moving average
w=np.ones(window_len,'d')
else:
w=eval('np.'+window+'(window_len)')
y=np.convolve(w/w.sum(),s,mode='valid')
return y
I load the image
image_file_name = 'im3.jpg'
image = cv2.imread(image_file_name)
# image category
category = 0
# gray convertion
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
height = image.shape[0]
width = image.shape[1]
First test. Does the image have any big white spots?
# First test. Does the image have any big white spots?
saturation_thresh = 250
raw_saturation_region = cv2.threshold(image_gray, saturation_thresh, 255, cv2.THRESH_BINARY)[1]
num_raw_saturation_regions, raw_saturation_regions,stats, _ = cv2.connectedComponentsWithStats(raw_saturation_region)
# index 0 is the background -> to remove
area_raw_saturation_regions = stats[1:,4]
min_area_bad_spot = 1000 # this can be calculated as percentage of the image area
if (np.max(area_raw_saturation_regions) > min_area_bad_spot):
category = 2 # there is at least one spot
The result for the image normal:
The result for the image with spots:
The result for the image with shadows:
If the image pass the first test, I process the second test. Is the image dark?
# Second test. Is the image dark?
min_mean_intensity = 60
if category == 0 :
mean_intensity = np.mean(image_gray)
if (mean_intensity < min_mean_intensity):
category = 3 # dark image
If the image pass also the second test, I process the third test. Is the image uniformy illuminatad?
window_len = 15 # odd number
delay = int((window_len-1)/2) # delay is the shift introduced from the smoothing. It's half window_len
# for example if the window_len is 15, the delay is 7
# infact hist.shape = 256 and smooted_hist.shape = 270 (= 256 + 2*delay)
if category == 0 :
perceived_brightness = get_perceived_brightness(image)
hist,bins = np.histogram(perceived_brightness.ravel(),256,[0,256])
# smoothed_hist is shifted from the original one
smoothed_hist = smooth(hist,window_len)
# smoothed histogram syncronized with the original histogram
sync_smoothed_hist = smoothed_hist[delay:-delay]
# if number the peaks with:
# 20<bin<250
# prominance >= mean histogram value
# the image could have shadows (but it could have also a background with some colors)
mean_hist = int(height*width / 256)
peaks, _ = find_peaks(sync_smoothed_hist, prominence=mean_hist)
selected_peaks = peaks[(peaks > 20) & (peaks < 250)]
if (selected_peaks.size>1) :
category = 4 # there are shadows
The histogram for the image normal:
The histogram for the image with spots:
The histogram for the image with shadows:
If the image pass all the tests, than it's normal
# all tests are passed. The image is ok
if (category == 0) :
category=1 # the image is ok
Related
for my class project I am trying to extract ridges and Valleys from the finger image. An example is given below.
#The code I am using
import cv2
import numpy as np
import fingerprint_enhancer
clip_hist_percent=25
image = cv2.imread("")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Calculate grayscale histogram
hist = cv2.calcHist([gray],[0],None,[256],[0,256])
hist_size = len(hist)
# Calculate cumulative distribution from the histogram
accumulator = []
accumulator.append(float(hist[0]))
for index in range(1, hist_size):
accumulator.append(accumulator[index -1] + float(hist[index]))
# Locate points to clip
maximum = accumulator[-1]
clip_hist_percent *= (maximum/100.0)
clip_hist_percent /= 2.0
# Locate left cut
minimum_gray = 0
while accumulator[minimum_gray] < clip_hist_percent:
minimum_gray += 1
# Locate right cut
maximum_gray = hist_size -1
while accumulator[maximum_gray] >= (maximum - clip_hist_percent):
maximum_gray -= 1
# Calculate alpha and beta values
alpha = 255 / (maximum_gray - minimum_gray)
beta = -minimum_gray * alpha
auto_result = cv2.convertScaleAbs(image, alpha=alpha, beta=beta)
gray = cv2.cvtColor(auto_result, cv2.COLOR_BGR2GRAY)
# compute gamma = log(mid*255)/log(mean)
mid = 0.5
mean = np.mean(gray)
gamma = math.log(mid*255)/math.log(mean)
# do gamma correction
img_gamma1 = np.power(auto_result,gamma).clip(0,255).astype(np.uint8)
g1 = cv2.cvtColor(img_gamma2, cv2.COLOR_BGR2GRAY)
# blur = cv2.GaussianBlur(g1,(2,1),0)
thresh2 = cv2.adaptiveThreshold(g1, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 199, 3)
# blur = cv2.GaussianBlur(thresh2,(2,1),0)
blur=((3,3),1)
erode_=(5,5)
dilate_=(3, 3)
dilate = cv2.dilate(cv2.erode(cv2.GaussianBlur(thresh2/255, blur[0],
blur[1]), np.ones(erode_)), np.ones(dilate_))*255
out = fingerprint_enhancer.enhance_Fingerprint(dilate)
I am having difficulty extracting the lines on the finger. I tried to adjust the brightness and contrast, applied calcHist, adaptive thresholding, applied blur, then applied the Gabor filters (as per UTKARSH code). The result look like above.
We could clearly see that the lower part of the image has many spurious lines. My project requirement is to get clear lines from the RGB image. Could anyone help me with the steps and the code?
Thank you in advance
reference:
https://github.com/Utkarsh-Deshmukh/Fingerprint-Enhancement-Python
https://ieeexplore.ieee.org/abstract/document/7358782
There are several strange things (IMO) about your code.
First you do a contrast stretch that sets the 12.5% darkest pixels to black and the 12.5% brightest pixels to white. You probably already have this number of white pixels, so not much happens there, but you do remove all the information in the darkest region of the finger print.
Next you threshold. Here you remove most of the remaining information. Thresholding is something you should leave until the very last step of any processing. In particular, the algorithm implemented in fingerprint_enhancer.enhance_Fingerprint() takes a gray-scale image as input. You should not binarize its input at all!
I would start with a local contrast stretch, then you can directly apply the enhancement algorithm:
import cv2
import fingerprint_enhancer
image = cv2.imread("zMxbO.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply local contrast stretch
se = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (25, 25)) # larger than the width of the widest ridges
low = cv2.morphologyEx(gray, cv2.MORPH_OPEN, se) # locally lowest grayvalue
high = cv2.morphologyEx(gray, cv2.MORPH_CLOSE, se) # locally highest grayvalue
gray = (gray - o) / (c - o + 1e-6)
# Apply fingerprint enhancement
out = fingerprint_enhancer.enhance_Fingerprint(gray, resize=True)
The local contrast stretch yields this:
The finger print enhancement algorithm now yields this:
Note things go wrong around the edges, where the background was cut out and replaced with white, as well as in the dark region, where the noise dominates and the enhancement algorithm hallucinates a bit. I don't think you can extract meaningful information from that area, a better illumination would be necessary.
I am trying to detect blurred images. Thanks to this post, I managed to create a script using Fast Fourier Transform and so far it worked quite well. But for some photos, I am not able to get correct results.
When the background is almost as the same color than the objects from the front, I think my script is not able to give good result.
Do you have any leads to correct this ?
import cv2
import imutils
from PIL import Image as pilImg
from IPython.display import display
import numpy as np
from matplotlib import pyplot as plt
def detect_blur_fft(image, size=60, thresh=17, vis=False):
"""
Detects blur by comparing the image to a blurred version of the image
:param image: The image to detect blur in
:param size: the dimension of the smaller square extracted from the image, defaults to 60 (optional)
:param thresh: the lower this value, the more blur is acceptable, defaults to 17 (optional)
:param vis: Whether or not to return a visualization of the detected blur points, defaults to False
(optional)
"""
# grab the dimensions of the image and use the dimensions to
# derive the center (x, y)-coordinates
(h, w) = image.shape
(cX, cY) = (int(w / 2.0), int(h / 2.0))
# compute the FFT to find the frequency transform, then shift
# the zero frequency component (i.e., DC component located at
# the top-left corner) to the center where it will be more
# easy to analyze
fft = np.fft.fft2(image)
fftShift = np.fft.fftshift(fft)
# check to see if we are visualizing our output
if vis:
# compute the magnitude spectrum of the transform
magnitude = 20 * np.log(np.abs(fftShift))
# display the original input image
(fig, ax) = plt.subplots(1, 2, )
ax[0].imshow(image, cmap="gray")
ax[0].set_title("Input")
ax[0].set_xticks([])
ax[0].set_yticks([])
# display the magnitude image
ax[1].imshow(magnitude, cmap="gray")
ax[1].set_title("Magnitude Spectrum")
ax[1].set_xticks([])
ax[1].set_yticks([])
# show our plots
plt.show()
# zero-out the center of the FFT shift (i.e., remove low
# frequencies), apply the inverse shift such that the DC
# component once again becomes the top-left, and then apply
# the inverse FFT
fftShift[cY - size:cY + size, cX - size:cX + size] = 0
fftShift = np.fft.ifftshift(fftShift)
recon = np.fft.ifft2(fftShift)
# compute the magnitude spectrum of the reconstructed image,
# then compute the mean of the magnitude values
magnitude = 20 * np.log(np.abs(recon))
mean = np.mean(magnitude)
# the image will be considered "blurry" if the mean value of the
# magnitudes is less than the threshold value
return (mean, mean <= thresh)
pathImg = "path to the image"
image = cv2.imread(pathImg)
# Resizing the image to 500 pixels in width.
image = imutils.resize(image, width= 500)
# Converting the image to gray scale.
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Using the FFT to detect blur.
(mean, blurry) = detect_blur_fft(gray, size=60)
image = np.dstack([gray] * 3)
# This is a conditional statement that will set the color to red if the image is blurry or green
color = (0, 0, 255) if blurry else (0, 255, 0)
text = "Blurry ({:.4f})" if blurry else "Not Blurry ({:.4f})"
text = text.format(mean)
# Adding text to the image.
cv2.putText(image, text, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
print("[INFO] {}".format(text))
# show the output image
display(pilImg.fromarray(image))
I am working on an image enhancement use case where one of the tasks is to rescale an image to a 3:4 ratio. But rather than blindly resizing the image by calculation on the height and width from the original image, I want it to be cropped, or in other words, I want to discard boundary pixels such that it matches the ratio and don't cut the primary object.
I have the segmentation mask using which I am getting the bounding box. I am also removing the background making it transparent for some other things. I am sharing both the binary mask and the original image.
I am using the below code to generate the box.
import cv2
import numpy as np
THRESHOLD = 0.9
mask = cv2.imread("mask.png")
mask = mask/255
mask[mask > THRESHOLD] = 1
mask[mask <= THRESHOLD] = 0
out_layer = mask[:,:,2]
x_starts = [np.where(out_layer[i]==1)[0][0] if len(np.where(out_layer[i]==1)[0])!=0 else out_layer.shape[0]+1 for i in range(out_layer.shape[0])]
x_ends = [np.where(out_layer[i]==1)[0][-1] if len(np.where(out_layer[i]==1)[0])!=0 else 0 for i in range(out_layer.shape[0])]
y_starts = [np.where(out_layer.T[i]==1)[0][0] if len(np.where(out_layer.T[i]==1)[0])!=0 else out_layer.T.shape[0]+1 for i in range(out_layer.T.shape[0])]
y_ends = [np.where(out_layer.T[i]==1)[0][-1] if len(np.where(out_layer.T[i]==1)[0])!=0 else 0 for i in range(out_layer.T.shape[0])]
startx = min(x_starts)
endx = max(x_ends)
starty = min(y_starts)
endy = max(y_ends)
start = (startx,starty)
end = (endx,endy)
If I understood your problem correctly, you just want to have the masking of the person in an image of size ratio 3:4 without cropping the mask. The approach you are talking about is possible but a bit unnecessary. I am sharing below the approach you can use with explanation and also I have used another approach to find the box. Use any approach you like.
import cv2
import numpy as np
MaskImg = cv2.imread("WomanMask.png", cv2.IMREAD_GRAYSCALE)
cv2.imwrite("RuntimeImages/Input MaskImg.png", MaskImg)
ret, MaskImg = cv2.threshold(MaskImg, 20, 255, cv2.THRESH_BINARY)
cv2.imwrite("RuntimeImages/MaskImg after threshold.png", MaskImg)
# Finding biggest contour in the image
# (Assuming that the woman mask will cover the biggest area of the mask image)
# Getting all external contours
Contours = cv2.findContours(MaskImg, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2]
# exit if no white pixel in the image (no contour found)
if len(Contours) == 0:
print("There was no white pixel in the image.")
exit()
# Sorting contours in decreasing order according to their area
Contours = sorted(Contours, key=lambda x:cv2.contourArea(x), reverse=True)
# Getting the biggest contour
BiggestContour = Contours[0] # This is the contour of the girl mask
# Finding the bounding rectangle
BB = cv2.boundingRect(BiggestContour)
print(f"Bounding rectangle : {BB}")
# Getting the position, width, and height of the woman mask
x, y = BB[0], BB[1]
Width, Height = BB[2], BB[3]
# Setting the (height / width) ratio required
Ratio = ( 3 / 4 ) # 3 : 4 :: Height : Width
# Getting the new dimentions of the image to fit the mask
if (Height > Width):
NewHeight = Height
NewWidth = int(NewHeight / Ratio)
else:
NewWidth = Width
NewHeight = int(NewWidth * Ratio)
# Getting the position of the woman mask in this new image
# It will be placed at the center
X = int((NewWidth - Width) / 2)
Y = int((NewHeight - Height) / 2)
# Creating the new image with the woman mask at the center
NewImage = np.zeros((NewHeight, NewWidth), dtype=np.uint8)
NewImage[Y : Y+Height, X : X+Width] = MaskImg[y : y+Height, x : x+Width]
cv2.imwrite("RuntimeImages/Final Image.png", NewImage)
Below is the final output mask image
I'm trying to quantize an image keeping all primary colors in place and removing all minor colors such as "anti-aliasing" borders.
E.g. the image below ultimately should be quantized to 3 colors whereas the number of actual colors in the original image is more than 30. All "anti-aliasing" border colors should be considered minors and eliminated upon quantization as well as "jpeg artifacts", which add more colors to the image because of over-optimization.
Note: a source image could be either png or jpeg.
For the quantization itself, I'm using PIL.quantize(...) with K as the number of colors to leave. And it works fairly well and keeps the palette perfectly matching to the original.
def color_quantize(path, K):
image = cv2.imread(path, cv2.IMREAD_UNCHANGED)
img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
im_pil = Image.fromarray(np.uint8(img))
im_pil = im_pil.quantize(K, None, 0, None)
return cv2.cvtColor(np.array(im_pil.convert("RGB")), cv2.COLOR_RGB2BGR)
Thus, if I knew "K" (the number of primary colors) in advance, then I would use it for im_pil.quantize(...). Basically, I need a way to get that "K" number.
Is there any way to determine the number of primary colors?
BTW, regarding the "jpeg artifacts" removal, I'm using img = cv2.bilateralFilter(img, 9, 75, 75) at the moment, which works quite well.
You may want to try to analyze the histograms of the RGB channels to find out how many peaks they have, hopefully you will have a few big peaks, and some very small ones, then the number of big peaks should be your K.
I've ended up with the following function to determine the number for dominant colors:
def get_dominant_color_number(img, threshold):
# remove significant artifacts
img = cv2.bilateralFilter(img, 9, 75, 75)
# resize image to make the process more efficient on 250x250 (without antialiasing to reduce color space)
thumbnail = cv2.resize(img, (250, 250), None)
# convert to HSV color space
imghsv = cv2.cvtColor(thumbnail, cv2.COLOR_BGR2HSV).astype("float32")
(h, s, v) = cv2.split(imghsv)
# quantize saturation and value to merge close colors
v = (v // 30) * 30
s = (s // 30) * 30
imghsv = cv2.merge([h,s,v])
thumbnail = cv2.cvtColor(imghsv.astype("uint8"), cv2.COLOR_HSV2BGR)
(unique, counts) = np.unique(thumbnail.reshape(-1, thumbnail.shape[2]), return_counts=True, axis = 0)
# calculate frequence of each color and sort them
freq = counts.astype("float")
freq /= freq.sum()
count_sort_ind = np.argsort(-counts)
# get frequent colors above the specified threshold
n = 0
dominant_colors = []
for (c) in count_sort_ind:
n += 1;
dominant_colors.append(unique[c])
if (freq[c] <= threshold):
break
return (dominant_colors, n)
# -----------------------------------------------------------
img = cv2.imread("File.png", cv2.IMREAD_UNCHANGED)
channels = img.shape[2]
if channels == 4:
trans_mask = img[:,:,3] == 0
img[trans_mask] = [254, 253, 254, 255]
img = cv2.cvtColor(img, cv2.COLOR_BGRA2BGR)
(dom_colors, dom_color_num) = get_dominant_color_number(img, .0045)
For the threshold ".0045" it gives an acceptable result. Yet, it still looks a bit "artificial".
I am new to openCV and I was wondering if there is a way to remove the periodic stripes in the lower half of this image.
I looked at this post but couldn't quite understand what was going on: Removing periodic noise from an image using the Fourier Transform
Here is how to mitigate (reduce, but not totally eliminate) the lines using Fourier Transform and notch filtering processing with Python/OpenCV/Numpy. Since the horizontal lines in the input are very close, there will be horizontal linear structures spaced far apart in the Fourier Transform spectrum. So what I did was:
Read the input
Pad with the mean value to powers of 2 size (to try to mitigate any ringing from the discontinuity with the padding)
Do the DFT
Compute the spectrum image from the magnitude
Threshold the image and draw a black horizontal line through the center to blank out the bright DC component
Find where the bright spots (lines) show.
Get the coordinates of the bright spots and draw white horizontal lines on the thresholded image to form a mask
Apply the mask to the magnitude image
Do the IDFT
Crop back to the size and normalize to the same dynamic range as the original image
Input:
import numpy as np
import cv2
import math
# read input as grayscale
img = cv2.imread('pattern_lines.png', 0)
hh, ww = img.shape
# get min and max and mean values of img
img_min = np.amin(img)
img_max = np.amax(img)
img_mean = int(np.mean(img))
# pad the image to dimension a power of 2
hhh = math.ceil(math.log2(hh))
hhh = int(math.pow(2,hhh))
www = math.ceil(math.log2(ww))
www = int(math.pow(2,www))
imgp = np.full((hhh,www), img_mean, dtype=np.uint8)
imgp[0:hh, 0:ww] = img
# convert image to floats and do dft saving as complex output
dft = cv2.dft(np.float32(imgp), flags = cv2.DFT_COMPLEX_OUTPUT)
# apply shift of origin from upper left corner to center of image
dft_shift = np.fft.fftshift(dft)
# extract magnitude and phase images
mag, phase = cv2.cartToPolar(dft_shift[:,:,0], dft_shift[:,:,1])
# get spectrum
spec = np.log(mag) / 20
min, max = np.amin(spec, (0,1)), np.amax(spec, (0,1))
# threshold the spectrum to find bright spots
thresh = (255*spec).astype(np.uint8)
thresh = cv2.threshold(thresh, 155, 255, cv2.THRESH_BINARY)[1]
# cover the center rows of thresh with black
yc = hhh // 2
cv2.line(thresh, (0,yc), (www-1,yc), 0, 5)
# get the y coordinates of the bright spots
points = np.column_stack(np.nonzero(thresh))
print(points)
# create mask from spectrum drawing horizontal lines at bright spots
mask = thresh.copy()
for p in points:
y = p[0]
cv2.line(mask, (0,y), (www-1,y), 255, 5)
# apply mask to magnitude such that magnitude is made black where mask is white
mag[mask!=0] = 0
# convert new magnitude and old phase into cartesian real and imaginary components
real, imag = cv2.polarToCart(mag, phase)
# combine cartesian components into one complex image
back = cv2.merge([real, imag])
# shift origin from center to upper left corner
back_ishift = np.fft.ifftshift(back)
# do idft saving as complex output
img_back = cv2.idft(back_ishift)
# combine complex components into original image again
img_back = cv2.magnitude(img_back[:,:,0], img_back[:,:,1])
# crop to original size
img_back = img_back[0:hh, 0:ww]
# re-normalize to 8-bits in range of original
min, max = np.amin(img_back, (0,1)), np.amax(img_back, (0,1))
notched = cv2.normalize(img_back, None, alpha=img_min, beta=img_max, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)
cv2.imshow("ORIGINAL", img)
cv2.imshow("PADDED", imgp)
cv2.imshow("MAG", mag)
cv2.imshow("PHASE", phase)
cv2.imshow("SPECTRUM", spec)
cv2.imshow("THRESH", thresh)
cv2.imshow("MASK", mask)
cv2.imshow("NOTCHED", notched)
cv2.waitKey(0)
cv2.destroyAllWindows()
# write result to disk
cv2.imwrite("pattern_lines_spectrum.png", (255*spec).clip(0,255).astype(np.uint8))
cv2.imwrite("pattern_lines_thresh.png", thresh)
cv2.imwrite("pattern_lines_mask.png", mask)
cv2.imwrite("pattern_lines_notched.png", notched)
Spectrum (note the bright spots in the middle at y=64 and 192):
Threshold Image:
Bright Spot Locations:
[[ 0 1023]
[ 0 1024]
[ 0 1025]
[ 1 1024]
[ 64 1024]
[ 65 1024]
[ 191 1024]
[ 192 1024]
[ 255 1024]]
Mask:
Result: