sharpen out border pixels of each object in image - python

I have an image with some random components... border of each component has some blurred pixels i.e....see screenshot
So using OpenCv and python, i want this image to be really sharp...i.e. no blurry pixels on the edge... as shown in this image below

To a certain tolerance for errors this behaviour can be achived by using k-means algorithm. This algorithm can be used to build clusters of pixels having similar color. The OpenCV k-means implementation provides a lot of parameters to tweak with to get the desired results. You may use the following code snippet to start with:
import cv2
import numpy as np
img = cv2.imread("/path/to/img.png")
Z = img.reshape((-1, 3))
Z = np.float32(Z)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 2
ret, label, center = cv2.kmeans(Z, K, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
center = np.uint8(center)
res = center[label.flatten()]
res2 = res.reshape((img.shape))
cv2.imwrite("./output.png", res2)
The above code will generate the following results for the sample image:
But this code may not work very will with the larger image(even after passing K=5), this is due to the fact that, we are depending upon the K-means to pick the seeds by random sampling. We have an option of passing in the seed colors to look for which will be BGR values for yellow, light greem, dark green, brown and blue in your code. After supplying these BGR values as seed, you can get better results.

Related

How to determine how many distinct curves are in an image using Python?

I am trying to write an algorithm to systematically determine how many different "curves" are in an image. Example Image. I'm specifically interested in the white lines here, so I've used a color threshold to mask the rest of the image and only get the white pixels. These lines represent a path run by a player (wide receivers in the NFL), so I'm interested in the x and y coordinates that the path represents - and each "curve" represents a different path that the player took (or "route"). All curves should start on or behind the blue line.
However, while I can get just the white pixels, I can't figure out how to systematically identify the separate curves. In this example image, there are 8 white curves (or routes) present. I've identified those curves in this image. I tried edge detection, and then using scipy ndimage to get the number of connected components, but because the curves overlap it counts them as connected and only gives me 3 labeled components for this image as opposed to eight. Here's what the edge detection output looks like. Is there a better way to go about this? Here is my sample code.
import cv2
from skimage.morphology import skeletonize
import numpy as np
from scipy import ndimage
#Read in image
image = cv2.imread('example_image.jpeg')
#Color boundary to get white pixels
lower_white = np.array([230, 230, 230])
upper_white = np.array([255, 255, 255])
#mask image for white pixels
mask = cv2.inRange(image, lower_white, upper_white)
c_pixels = cv2.bitwise_and(image, image, mask=mask)
#make pixels from 0 to 1 form to use in skeletonize
c_pixels = c_pixels.clip(0,1)
ske_c = skeletonize(c_pixels[:,:,1]).astype(np.uint8)
#Edge Detection
inputImage =ske_c*255
edges = cv2.Canny(inputImage,100,200,apertureSize = 7)
#Show edges
cv2.imshow('edges', edges)
cv2.waitKey(0)
cv2.destroyAllWindows()
#Find number of components
# smooth the image (to remove small objects); set the threshold
edgesf = ndimage.gaussian_filter(edges, 1)
T = 50 # set threshold by hand to avoid installing `mahotas` or
# `scipy.stsci.image` dependencies that have threshold() functions
# find connected components
labeled, nr_objects = ndimage.label(edgesf > T) # `dna[:,:,0]>T` for red-dot case
print("Number of objects is %d " % nr_objects)

How to generate separate bounding boxes for non-contiguous photo color mask areas

I want to generate bounding boxes for approximately non-contiguous areas of a photo by color mask--in this case, green bands containing vegetation--so that I can clip that area to pass to an image classification function.
This is something that can be done with geospatial rasters fairly easily using GDAL, to polygonize areas of geotiff that have similar characteristics (https://www.gdal.org/gdal_polygonize.html). But in this case, I am trying to do this to a photo. I haven't found a solution for pure rasters.
For example, take a photo like this one:
which is masked into green bands using openCV and numpy:
try:
hsv=cv.cvtColor(image,cv.COLOR_BGR2HSV)
except:
print("File may be corrupt")
return(0,0)
# Define lower and uppper limits of what we call "brown"
brown_lo=np.array([18,0,0])
brown_hi=np.array([28,255,255])
green_lo=np.array([29,0,0])
green_hi=np.array([88,255,255])
# Mask image to only select browns
mask_brown=cv.inRange(hsv,brown_lo,brown_hi)
mask_green=cv.inRange(hsv,green_lo,green_hi)
hsv[mask_brown>0]=(18,255,255)
hsv[mask_green>0]=(53,255,255)
image2=cv.cvtColor(hsv,cv.COLOR_HSV2BGR)
cv.imwrite(QUERIES + 'queries/mask.jpg', image2)
I'd like to generate boxing boxes or polygons for the areas indicated here:
Any ideas how to do that?
I tried using openCV contours and convexhull algorithms, but they don't really get me anywhere:
threshold = val
# Detect edges using Canny
canny_output = cv.Canny(src_gray, threshold, threshold * 2)
# Find contours
_, contours, _ = cv.findContours(canny_output, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
# Find the convex hull object for each contour
hull_list = []
for i in range(len(contours)):
hull = cv.convexHull(contours[i])
hull_list.append(hull)
# Draw contours + hull results
drawing = np.zeros((canny_output.shape[0], canny_output.shape[1], 3), dtype=np.uint8)
for i in range(len(contours)):
color = (rng.randint(0,256), rng.randint(0,256), rng.randint(0,256))
cv.drawContours(drawing, contours, i, color)
cv.drawContours(drawing, hull_list, i, color)
# Show in a window
cv.imshow('Contours', drawing)
So, as you tried with contours and it didn't work, I tried to go the clustering way. I started off with K-means which is the best place to start IMO. Here is the code:
import cv2
import numpy as np
from sklearn.cluster import KMeans, MeanShift
import matplotlib.pyplot as plt
def centroid_histogram(clt):
numlabels = np.arange(0, len(np.unique(clt.labels_)) + 1)
(hist, _) = np.histogram(clt.labels_, bins=numlabels)
hist = hist.astype("float")
hist /= hist.sum()
return hist
def plot_colors(hist, centroids):
bar = np.zeros((50, 300, 3), dtype="uint8")
startX = 0
for (percent, color) in zip(hist, centroids):
endX = startX + (percent * 300)
cv2.rectangle(bar, (int(startX), 0), (int(endX), 50),
color.astype("uint8").tolist(), -1)
startX = endX
return bar
image1 = cv2.imread('mean_shift.jpg')
image1 = cv2.cvtColor(image1, cv2.COLOR_BGR2RGB)
image = image1.reshape((image1.shape[0] * image1.shape[1], 3))
#clt = MeanShift(bandwidth=2, bin_seeding=True)
clt = KMeans(n_clusters=3)
clt.fit(image)
hist = centroid_histogram(clt)
bar = plot_colors(hist, clt.cluster_centers_)
plt.figure()
plt.axis("off")
plt.imshow(bar)
plt.show()
Using 3 cluster centers, I got the following result:
and using 6 cluster centers:
which basically shows you the proportion of these colors in the image.
The links that helped me do this:
https://www.pyimagesearch.com/2014/05/26/opencv-python-k-means-color-clustering/ and
https://github.com/log0/build-your-own-meanshift/blob/master/Meanshift%20Image%20Segmentation.ipynb
Now, there are couple of issues I see here:
You might not know the number of clusters in all the images. In that case, you should look into Mean-Shift. Unlike the K-Means algorithm, meanshift does not require specifying the number of clusters in advance. The number of clusters is determined by the algorithm with respect to the data.
I have used SLIC in these kind of problems. It is a subset of the K-means-based algorithms and is very efficient. You can also give this one a try since it is available in scikit, which is the goto library for machine learning in python IMO. In the same direction, you could also give this a try.
Hope I have helped you some way! Cheers!

How to Improve OCR on image with text in different colors and fonts?

I'm using the Google Vision API to extract the text from some pictures, however, I have been trying to improve the accuracy (confidence) of the results with no luck.
every time I change the image from the original I lose accuracy in detecting some characters.
I have isolated the issue to have multiple colors for different words with can be seen that words in red for example have incorrect results more often than the other words.
Example:
some variations on the image from gray scale or b&w
What ideas can I try to make this work better, specifically changing the colors of text to a uniform color or just black on a white background since most algorithms expect that?
some ideas I already tried, also some thresholding.
dimg = ImageOps.grayscale(im)
cimg = ImageOps.invert(dimg)
contrast = ImageEnhance.Contrast(dimg)
eimg = contrast.enhance(1)
sharp = ImageEnhance.Sharpness(dimg)
eimg = sharp.enhance(1)
I can only offer a butcher's solution, potentially a nightmare to maintain.
In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times.
My prerequisites:
I knew exactly in which area of the screen the text was going to go.
I knew exactly which fonts and colors were going to be used.
the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.
I could not detect reliably text changes to average frames and reduce the interference.
What I did:
- I measured the kerning width of each character. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about.
- The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one.
(Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters).
In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time.
You tried almost every standard step. I would advise you to try some PIL built-in filters like sharpness filter. Apply sharpness and contrast on the RGB image, then binarise it. Perhaps use Image.split() and Image.merge() to binarise each colour separately and then bring them back together.
Or convert your image to YUV and then use just Y channel for further processing.
Also, if you do not have a monochrome background consider performing some background substraction.
What tesseract likes when detecting scanned text is removed frames, so you can try to destroy as much of non character space from the image. (You might need to keep the picture size though, so you should replace it with white colour). Tesseract also likes straight lines. So some deskewing might be in order if your text is recorded at an angle. Tesseract also sometimes gives better results if you resize the image to twice its original size.
I suspect that Google Vision uses tesseract, or portions of it, but what other preprocessing it does for you I have no idea. So some of my advices here might actually be implemented already and doing them would be unnecessary and repetitive.
You will need to pre-process the image more than once, and use a bitwise_or operation to combine the results. To extract the colors, you could use
import cv2
boundaries = [ #BGR colorspace for opencv, *not* RGB
([15, 15, 100], [50, 60, 200]), #red
([85, 30, 2], [220, 90, 50]), #blue
([25, 145, 190], [65, 175, 250]), #yellow
]
for (low, high) in boundaries:
low = np.array(low, dtype = "uint8")
high = np.array(high, dtype = "uint8")
# find the colors within the specified boundaries and apply
# the mask
mask = cv2.inRange(image, low, high)
bitWise = cv2.bitwise_and(image, image, mask=mask)
#now here is the image masked with the specific color boundary...
Once you have the masked image, you can do another bitwise_or operation on your to-be "final" image, essentially adding this mask to it.
but this specific implementation requires opencv, however the same principle applies for other image packages.
I need a little more context on this.
How many calls are you going to do to the Google Vision API? If you are doing this throughout a whole stream, you'd probably need to get a paid subscription.
What are you going to do with this data? How accurate does the OCR need to be?
Assuming you get this snapshot from another's twitch stream, dealing with the streamer's video compression and network connectivity, you're going to get pretty blurry snapshot, so OCR is going to be pretty tough.
The image is far too blurry because of video compression, so even preprocessing the image to improve quality may not get the image quality high enough for accurate OCR. If you are set on OCR, one approach you could try:
Binarize the image to get the non-red text in white and background black as in your binarized image:
from PIL import Image
def binarize_image(im, threshold):
"""Binarize an image."""
image = im.convert('L') # convert image to monochrome
bin_im = image.point(lambda p: p > threshold and 255)
return bin_im
im = Image.open("game_text.JPG")
binarized = binarize_image(im, 100)
Extract only the red text values with a filter, then binarize it:
import cv2
from matplotlib import pyplot as plt
lower = [15, 15, 100]
upper = [50, 60, 200]
lower = np.array(lower, dtype = "uint8")
upper = np.array(upper, dtype = "uint8")
mask = cv2.inRange(im, lower, upper)
red_binarized = cv2.bitwise_and(im, im, mask = mask)
plt.imshow(cv2.cvtColor(red_binarized, cv2.COLOR_BGR2RGB))
plt.show()
However, even with this filtering, it still doesn't extract red well.
Add images obtained in (1.) and (2.).
combined_image = binarized + red_binarized
Do OCR on (3.)
This is not a full solution but it may drive to something better.
By converting your data from BGR (or RGB) to CIE-Lab you can process a grayscale image as the weighted sum of the colour channels a* and b*.
This grayscale image will enhance colour regions of the text.
But adapting the threshold you can from this grayscale image segment the coloured word in your original image and get the other words from the a L channel thresholding.
A bitwise and operator should be enough to merge to two segmentation image.
If you can have an image with a better contrast a very last step could be a filling based on the contours.
For that take a look to RETR_FLOODFILL of the function 'cv2.findContours'.
Any other hole filing function from other package may also fit for that purpose.
Here is a code that show the first part of my idea.
import cv2
import numpy as np
from matplotlib import pyplot as plt
I = cv2.UMat(cv2.imread('/home/smile/QSKN.png',cv2.IMREAD_ANYCOLOR))
Lab = cv2.cvtColor(I,cv2.COLOR_BGR2Lab)
L,a,b = cv2.split(Lab)
Ig = cv2.addWeighted(cv2.UMat(a),0.5,cv2.UMat(b),0.5,0,dtype=cv2.CV_32F)
Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
#k = np.ones((3,3),np.float32)
#k[2,2] = 0
#k*=-1
#
#Ig = cv2.filter2D(Ig,cv2.CV_32F,k)
#Ig = cv2.absdiff(Ig,0)
#Ig = cv2.normalize(Ig,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
_, Ib = cv2.threshold(Ig,0.,255.,cv2.THRESH_OTSU)
_, Lb = cv2.threshold(cv2.UMat(L),0.,255.,cv2.THRESH_OTSU)
_, ax = plt.subplots(2,2)
ax[0,0].imshow(Ig.get(),cmap='gray')
ax[0,1].imshow(L,cmap='gray')
ax[1,0].imshow(Ib.get(),cmap='gray')
ax[1,1].imshow(Lb.get(),cmap='gray')
import numpy as np
from skimage.morphology import selem
from skimage.filters import rank, threshold_otsu
from skimage.util import img_as_float
from PIL import ImageGrab
import matplotlib.pyplot as plt
def preprocessing(image, strelem, s0=30, s1=30, p0=.3, p1=1.):
image = rank.mean_bilateral(image, strelem, s0=s0, s1=s1)
condition = (lambda x: x>threshold_otsu(x))(rank.maximum(image, strelem))
normalize_image = rank.autolevel_percentile(image, strelem, p0=p0, p1=p1)
return np.where(condition, normalize_image, 0)
#Grab image from clipboard
image = np.array(ImageGrab.grabclipboard())
sel = selem.disk(4)
a = sum([img_as_float(preprocessing(image[:, :, x], sel, p0=0.3)) for x in range(3)])/3
fig, ax = plt.subplots(1, 2, sharey=True, sharex=True)
ax[0].imshow(image)
ax[1].imshow(rank.autolevel_percentile(a, sel, p0=.4))
This is my code for clearing text from noise and creating uniform brightness for characters.
With minor modifications, I used it to solve your problem.

K-means color clustering - omit background pixels with masked numpy arrays

I'm trying to find the 3 dominant colours of an several images using K-means clustering. The problem I'm facing is that K-means also clusters the background of the image. I am using Python 2.7 and OpenCV 3
All images have the same grey background of the following RGB colour: 150,150,150. To avoid that K-means also clusters the background color, I created a masked array which masks all '150' pixel values from the original image array, theoretically leaving only the non-background pixels in the array for K-Means to work with. However, when I run my script, it still returns the grey as one of the dominant colours.
My question: is a masked array the way to go (and did I do something wrong) or are there better alternatives to somehow exclude pixels from K-means clustering?
Please find my code below:
from sklearn.cluster import KMeans
from sklearn import metrics
import cv2
import numpy as np
def centroid_histogram(clt):
numLabels = np.arange(0, len(np.unique(clt.labels_)) + 1)
(hist, _) = np.histogram(clt.labels_, bins=numLabels)
hist = hist.astype("float")
hist /= hist.sum()
return hist
image = cv2.imread("test1.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
h, w, _ = image.shape
w_new = int(100 * w / max(w, h))
h_new = int(100 * h / max(w, h))
image = cv2.resize(image, (w_new, h_new))
image_array = image.reshape((image.shape[0] * image.shape[1], 3))
image_array = np.ma.masked_values(image_array,150)
clt = KMeans(n_clusters=3)
clt.fit(image_array)
hist = centroid_histogram(clt)
zipped = zip(hist, clt.cluster_centers_)
zipped.sort(reverse=True, key=lambda x: x[0])
hist, clt.cluster_centers = zip(*zipped)
print(clt.cluster_centers_)
If you want to extract the values of pixels other than your background, you can use numpy indexation :
img2=image_array[image_array!=[150,150,150]]
img2=img2.reshape((len(img2)/3,3))
This will yield the list of pixels which are not [150,150,150].
However, it does not preserve the structure of the image, just gives you the list of pixels values. I can't really remember, but maybe for K-means you need to give the whole image, i.e. you also need to feed it the position of the pixels ? But in that case, no masking will ever help because masking is just replacing values of certain pixels by another, not getting rid of pixels all together.

reduce image to N colors in OpenCV Python

I can only ever find examples in C/C++ and they never seem to map well to the OpenCV API. I'm loading video frames (both from files and from a webcam) and want to reduce them to 16 color, but mapped to a 24-bit RGB color-space (this is what my output requires - a giant LED display).
I read the data like this:
ret, frame = self._vid.read()
image = cv2.cvtColor(frame, cv2.COLOR_RGB2BGRA)
I did find the below python example, but cannot figure out how to map that to the type of output data I need:
import numpy as np
import cv2
img = cv2.imread('home.jpg')
Z = img.reshape((-1,3))
# convert to np.float32
Z = np.float32(Z)
# define criteria, number of clusters(K) and apply kmeans()
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 8
ret,label,center=cv2.kmeans(Z,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
# Now convert back into uint8, and make original image
center = np.uint8(center)
res = center[label.flatten()]
res2 = res.reshape((img.shape))
cv2.imshow('res2',res2)
cv2.waitKey(0)
cv2.destroyAllWindows()
That obviously works for the OpenCV image viewer but trying to do the same errors on my output code since I need an RGB or RGBA format. My output works like this:
for y in range(self.height):
for x in range(self.width):
self._led.set(x,y,tuple(image[y,x][0:3]))
Each color is represented as an (r,g,b) tuple.
Any thoughts on how to make this work?
I think the following could be faster than kmeans, specially with a k = 16.
Convert the color image to gray
Contrast stretch this gray image to so that resulting image gray levels are between 0 and 255 (use normalize with NORM_MINMAX)
Calculate the histogram of this stretched gray image using 16 as the number of bins (calcHist)
Now you can modify these 16 values of the histogram. For example you can sort and assign ranks (say 0 to 15), or assign 16 uniformly distributed values between 0 and 255 (I think these could give you a consistent output for a video)
Backproject this histogram onto the stretched gray image (calcBackProject)
Apply a color-map to this backprojected image (you might want to scale the backprojected image befor applying a colormap using applyColorMap)
Tip for kmeans:
If you are using kmeans for video, you can use the cluster centers from the previous frame as the initial positions in kmeans for the current frame. That way, it'll take less time to converge, so kmeans in the subsequent frames will most probably run faster.
You can speed up your processing by applying the k-means on a downscaled version of your image. This will give you the cluster centroids. You can then quantify each pixel of the original image by picking the closest centroid.

Categories